The SPARC processors

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER5+
    3. IBM BlueGene processors
    4. Intel Itanium 2
    5. Intel Xeon
    6. The MIPS processor
    7. The SPARC processors
  8. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
    4. QsNet
    5. SCI
Available systems
  1. The Bull NovaScale
  2. The C-DAC PARAM Padma
  3. The Cray X1E
  4. The Cray XT3
  5. The Cray XT4
  6. The Cray XMT
  7. The Fujitsu/Siemens M9000
  8. The Fujitsu/Siemens PRIMEQUEST
  9. The Hitachi BladeSymphony
  10. The Hitachi SR11000
  11. The HP Integrity Superdome
  12. The IBM eServer p575
  13. The IBM BlueGene/L&P
  14. The Liquid Computing LiquidIQ
  15. The NEC Express5800/1000
  16. The NEC SX-8
  17. The SGI Altix 4000
  18. The SiCortex SC series
  19. The Sun M9000
Systems disappeared from the list
Systems under development
Glossary
Acknowledgments
References

Sun has shelved its own plans to produce UltraSPARC V and VI processor by April 2004 in favour of processor designs with many (≥ 8) processor cores, each capable of handling several execution threads. This so-called Rock processor is still some time away and for the present the SPARC development in the hands of its partner Fujitsu that will advance with its own SPARC64 implementation. Both Fujitsu/Siemens and Sun market servers based on the latter processor. As Sun does not actively market its UltraSPARC IV+ based servers anymore we refrain from a description of this processor and only give details of Fujitsu's SPARC64 processor line.

For quite some time Fujitsu is making its own SPARC implementation, called SPARC64. Presently the SPARC64 is in its sixth generation, the SPARC64 VI. Obviously, the processor must be able to execute the SPARC instruction set but the processor internals are rather different from Sun's implementation. Figure 14 shows a block diagram of the dual core SPARC64 VI.

Block diagram of the Fujitsu SPARC64 VI processor core.

Figure 14: Block diagram of the Fujitsu SPARC64 VI processor core.

The L1 instruction and data caches are 128 KB, two times larger than in the late SPARC4+ core and both 2-way set-associative. There is also an Instruction Buffer (IBF) than contains up to 48 4-byte instructions and continues to feed the registers through the Instruction Word Register when an L1 I-cache miss has occurred. A maximum of four instructions can be scheduled each cycle and find their way via the reservation stations for address generation (RSA), integer execution units (RSE), and floating-point units (RSF) to the registers. The two general register files serve both the two Address Generation units EAG-A, and -B and the Integer Execution units EX-A and -B. The latter two are not equivalent: only EX-A can execute multiply and divide instructions. There also two floating-point register files (FPR), that feed the two Floating-Point units FL-A and FL-B. These units are different from those of Sun in that they are able to execute fused multiply-add instructions as is also the case in the POWER and Itanium processors. Consequently, a maximum of 4 floating-point results/cycle can be generated. In addition, FL-A an -B also perform divide and square root operations in contrast to the SPARC4+ that has a separate unit for these operations. Because of there iterative nature the divide and square root operations are not pipelined. The feedback from the execution units to the registers is decoupled by update buffers: GUB for the general registers and FUB for the floating-point registers.

The dispatch of instructions via the reservation stations,that each can hold 10 instructions, gives the opportunity of speculative dispatch: i.e., dispatching instructions of which the operands are not yet ready at the moment of dispatch but will be by the time that the instruction is actually executed. The assumption is that it results in a more even flow of instructions to the execution units.

The SPARC64 VI does not have a third level cache but on chip there is a large (6 MB) unified L2 cache that is a 12-way set-associative write-through cache that is shared by the 2 cores in a processor as can be seen in Figure 14a. Note that the system bandwidth is the highest available. For the lower end systems this bandwidth is about 8 GB/s.
The Memory Management Unit (not shown in Figure 14) contains separate sets of Translation Look aside Buffers (TLB) for instructions and for data. Each set is composed of a 32-entry µTLB and a 1024-entry main TLB. The µTLBs are accessed by high-speed pipelines by their respective caches.

Block diagram of the Fujitsu SPARC64 VI processor chip.
          Two cores share the L2 cache

Figure 14a: Block diagram of the Fujitsu SPARC64 VI processor chip. Two cores share the L2 cache.

What cannot be shown in the diagrams is that, like the IBM and Intel processors, the SPARC VI is dual-threaded per core. The type of multithreading is similar to that found in the Intel processors and is called Vertical Multithreading (VMT) as opposed to the simultaneous multithreading present in the IBM processors. At this moment the highest clock frequency SPARC64 available is 2.4 GHz. As already remarked, the floating-point units are capable of a fused multiply-add operation, like the POWER and Itanium processors, and so the theoretical peak performance is presently 9.6 Gflop/s/core. Fujitsu plans to bring out a dual core SPARC64 VI+ at 2.7 GHz early 2008 and a quad-core SPARC VII is scheduled later on.