您的位置:澳门新葡4473网站 > 澳门新葡4473网站 > Windows 7中的无规律内存通道技术 - Windows7之家,

Windows 7中的无规律内存通道技术 - Windows7之家,

发布时间:2019-11-22 16:58编辑:澳门新葡4473网站浏览(180)

    Win7之家:Windows 7中的无规律内部存款和储蓄器通道技艺

    Appendix A


    Windows 7不只好够更加好地运用演算越来越快的CPU,同期对单微电路多为重的支撑也丰盛好。61个人的Windows 7在单台机器上以致能够支撑超越陆十二个逻辑微处理器。

    PC hardware



    This appendix describes personal computer (PC) hardware, the platform on which xv6 runs.



    A PC is a computer that adheres to several industry standards, with the goal that a given piece of software can run on PCs sold by multiple vendors. These standards evolve over time and a PC from 1990s doesn’t look like a PC now.



    From the outside a PC is a box with a keyboard, a screen, and various devices (e.g., CD-rom, etc.). Inside the box is a circuit board (the ‘‘motherboard’’) with CPU chips, memory chips, graphic chips, I/O controller chips, and busses through which the chips communicate. The busses adhere to standard protocols (e.g., PCI and USB) so that devices will work with PCs from multiple vendors.

    从表面看,pc正是叁个蕴涵键盘、显示屏以致两个设施的盒子(比如CD驱动器等卡塔尔。盒子的内部是一块电路主板(也叫主板卡塔尔国,带有CPU集成电路、内部存款和储蓄器微电路、显示微芯片、I/O调控微芯片以至连接各集成电路的总线。总线服从标准公约(如PCI和USB卡塔 尔(英语:State of Qatar),那样多少个厂家临盆的配备能够和PC一同专门的学问。


    From our point of view, we can abstract the PC into three components: CPU, memory, and input/output (I/O) devices. The CPU performs computation, the memory contains instructions and data for that computation, and devices allow the CPU to interact with hardware for storage, communication, and other functions.

    从大家的理念出发,能够把PC抽象为多个组成都部队分:CPU、内部存款和储蓄器和输入输出设备(I/O卡塔 尔(英语:State of Qatar)。CPU担当总括,内存为总计存款和储蓄指令和数据,外界设备允许CPU和硬件之间开展沟通,如存款和储蓄、通信和别的职能。


    You can think of main memory as connected to the CPU with a set of wires, or lines, some for address bits, some for data bits, and some for control flags. To read a value from main memory, the CPU sends high or low voltages representing 1 or 0 bits on the address lines and a 1 on the ‘‘read’’ line for a prescribed amount of time and then reads back the value by interpreting the voltages on the data lines. To write a value to main memory, the CPU sends appropriate bits on the address and data lines and a 1 on the ‘‘write’’ line for a prescribed amount of time. Real memory interfaces are more complex than this, but the details are only important if you need to achieve high performance.



    乘势Computer微芯片和操作系统的进步,微软也对程序猿们提议了越来越高的要求,强调他们支付出的付加物必须适应如此的晋升。基本来讲,开垦者们方可使用多核微电脑带给的双线运维技能提升品质。那正是Windows 7中的无规律内存通道本事。

    Processor and memory



    A computer’s CPU (central processing unit, or processor) runs a conceptually simple loop: it consults an address in a register called the program counter, reads a machine instruction from that address in memory, advances the program counter past the instruction, and executes the instruction. Repeat. If the execution of the instruction does not modify the program counter, this loop will interpret the memory pointed at by the program counter as a sequence of machine instructions to run one after the other. Instructions that do change the program counter include branches and function calls.



    The execution engine is useless without the ability to store and modify program data. The fastest storage for data is provided by the processor’s register set. A register is a storage cell inside the processor itself, capable of holding a machine word-sized value (typically 16, 32, or 64 bits). Data stored in registers can typically be read or written quickly, in a single CPU cycle.



    PCs have a processor that implements the x86 instruction set, which was originally defined by Intel and has become a standard. Several manufacturers produce processors that implement the instruction set. Like all other PC standards, this standard is also evolving but newer standards are backwards compatible with past standards. The boot loader has to deal with some of this evolution because every PC processor starts simulating an Intel 8088, the CPU chip in the original IBM PC released in 1981. However, for most of xv6 you will be concerned with the modern x86 instruction set.

    PC的微型机,完成X86指令集。那么些命令集最先由因特尔提议并化作正式。四个厂家临盆达成那么些指令集的计算机。象全体其余的PC标准雷同,这几个规范随着时间推移也在转移,但新的行业内部均会向后十分旧的正规。运行器必须要应对这么些发展。因为各样PC微电脑在开机时均模拟成因特尔的8088,这么些CPU晶片是最先IBM PC在一九八三年公布的。然则,在xv6中,大多数气象是您仅须要关爱今世x86指令集。


    The modern x86 provides eight general purpose 32-bit registers—%eax, %ebx, %ecx, %edx, %edi, %esi, %ebp, and %esp—and a program counter %eip (the ‘ instruction pointer). The common e prefix stands for extended, as these are 32-bit extensions of the 16-bit registers %ax, %bx, %cx, %dx, %di, %si, %bp, %sp, and %ip. The two register sets are aliased so that, for example, %ax is the bottom half of %eax: writing to %ax changes the value stored in %eax and vice versa. The first four registers also have names for the bottom two 8-bit bytes: %al and %ah denote the low and high 8 bits of %ax; %bl, %bh, %cl, %ch, %dl, and %dh continue the pattern. In addition to these registers, the x86 has eight 80-bit floating-point registers as well as a handful of special-purpose registers like the control registers %cr0, %cr2, %cr3, and %cr4; the debug registers %dr0, %dr1, %dr2, and %dr3; the segment registers %cs, %ds, %es, %fs, %gs, and %ss; and the global and local descriptor table pseudo-registers %gdtr and %ldtr. The control registers and segment registers are important to any operating system. The floating-point and debug registers are less interesting and not used by xv6.



    Registers are fast but expensive. Most processors provide at most a few tens of general-purpose registers. The next conceptual level of storage is the main random-access memory (RAM). Main memory is 10-100x slower than a register, but it  is much cheaper, so there can be more of it. One reason main memory is relatively slow is that it is physically separate from the processor chip. An x86 processor has a few dozen registers, but a typical PC today has gigabytes of main memory. Because of the enormous differences in both access speed and size between registers and main memory, most processors, including the x86, store copies of recently-accessed sections of main memory in on-chip cache memory. The cache memory serves as a middle ground between registers and memory both in access time and in size. Today’s x86 processors typically have two levels of cache, a small first-level cache with access times relatively close to the processor’s clock rate and a larger second-level cache with access times in between the first-level cache and main memory. This table shows actual numbers for an Intel Core 2 Duo system:

    寄放器快但很贵。大多数Computer提供最好些个13个左右的通用贮存器。下五个定义档案的次序的存款和储蓄器是主随机访问内部存款和储蓄器(RAM卡塔 尔(英语:State of Qatar)。主内部存款和储蓄器要比存放器慢10到100倍,但却平价超多,由此数据上得以有过多。主存慢的一个缘由是它与拍卖微电路相抽离的。二个x86微电脑能够有大概10个左右的寄存器,但今日标准的PC却有成G的主存。在贮存器和主存的访谈速度和分寸上有宏大差距,大多数计算机,包含x86,存款和储蓄方今拜会的主存区域的三个拷贝到晶片上的缓存中。缓存的访问速度和分寸均在内部存款和储蓄器和贮存器的三此中间水平。几近期标准的x86微处理机皆有二级缓存,稍的超级缓存访谈时间贴近微处理机的时钟周期,非常的大的二级缓存访谈时间在一级缓存和主存之间。下边包车型地铁表显示了三个因特尔Core 2 Duo的诚实访谈数据:


    Intel Core 2 Duo E7200 at 2.53 GHz

    TODO: Plug in non-made-up numbers!


    access time访问时间



     0.6 ns

     64 bytes

    L1 cache

     0.5 ns

     64 kilobytes

    L2 cache

     10 ns

     4 megabytes

    main memory

     100 ns

     4 gigabytes



    For the most part, x86 processors hide the cache from the operating system, so we can think of the processor as having just two kinds of storage—registers and memory—and not worry about the distinctions between the different levels of the memory hierarchy.



    Phil Pennington,Windows Server开辟首席营业官揭露:“陆11人版的Windows 7和Windows Server 2010奔驰M级2支撑单台机器上超过63个的逻辑微处理器。多核微处理器的特出性以往得以因此Windows 7中的无规律内部存款和储蓄器通道技艺展示出来。在不远的现在,4CPU8着力的系统将会贯彻64颗逻辑微机的支撑。相当多服务器等级的应用方案都要求Windows 7中的无规律内部存储器通道能力来贯彻对64颗逻辑微机的支撑。”




    Processors must communicate with devices as well as memory. The x86 processor provides special in and out instructions that read and write values from device addresses called I/O ports. The hardware implementation of these instructions is essentially the same as reading and writing memory. Early x86 processors had an extra address line: 0 meant read/write from an I/O port and 1 meant read/write from main memory. Each hardware device monitors these lines for reads and writes to its assigned range of I/O ports. A device’s ports let the software configure the device, examine its status, and cause the device to take actions; for example, software can use I/O port reads and writes to cause the disk interface hardware to read and write sectors on the disk.



    Many computer architectures have no separate device access instructions. Instead the devices have fixed memory addresses and the processor communicates with the device (at the operating system’s behest) by reading and writing values at those addresses. In fact, modern x86 architectures use this technique, called memory-mapped I/O, for most high-speed devices such as network, disk, and graphics controllers. For reasons of backwards compatibility, though, the old in and out instructions linger, as do legacy hardware devices that use them, such as the IDE disk controller, which xv6 uses.


    Windows 7中的无规律内存通道本事首要用来缓慢解决Computer的总线约束。通过Windows 7中的无规律内部存款和储蓄器通道技术,机器能够加速硬件设施的运作速度。

    Windows 7 will be able to take advantage not only of faster CPUs, but of multiple processors on a single chip. The 64-bit edition of the operating system in particular will be able to support over 64 Logical Processors per machine.

    In this regard, Microsoft underlined the need for software developers to adapt their applications in accordance with the evolution of processor chips and that of the Windows operating system. Essentially, consistent gains in performance are synonymous with using parallel programming techniques in concordance with many-core processors. This is where non-uniform memory access comes in.

    “The 64-bit versions of Windows 7 and Windows Server 2008 R2 support more than 64 Logical Processors on a single computer,” Phil Pennington, Windows Server Technical Evangelism, revealed. “New processors are now appearing that leverage non-uniform memory access architectures. Within the near future, a system with 4 CPU sockets, 8 processor-cores per socket, and with Simultaneious Multi-Threading enabled per core, will achieve 64 Logical Processors. Many server-class solutions will need to be architected with NUMA awareness in order to achieve linear performance scaling on 64+ LP systems.” ..

    本文由澳门新葡4473网站发布于澳门新葡4473网站,转载请注明出处:Windows 7中的无规律内存通道技术 - Windows7之家,