The SGI Altix 4000 series

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER5+
    3. IBM BlueGene processors
    4. Intel Itanium 2
    5. Intel Xeon
    6. The MIPS processor
    7. The SPARC processors
  8. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
    4. QsNet
    5. SCI
Available systems
  1. The Bull NovaScale
  2. The C-DAC PARAM Padma
  3. The Cray X1E
  4. The Cray XT3
  5. The Cray XT4
  6. The Cray XMT
  7. The Fujitsu/Siemens M9000
  8. The Fujitsu/Siemens PRIMEQUEST
  9. The Hitachi BladeSymphony
  10. The Hitachi SR11000
  11. The HP Integrity Superdome
  12. The IBM eServer p575
  13. The IBM BlueGene/L&P
  14. The Liquid Computing LiquidIQ
  15. The NEC Express5800/1000
  16. The NEC SX-8
  17. The SGI Altix 4000
  18. The SiCortex SC series
  19. The Sun M9000
Systems disappeared from the list
Systems under development
Glossary
Acknowledgments
References

Machine type RISC-based ccNUMA system
Models Altix 4700
Operating system Linux (SuSE SLES9/10, RedHat EL4) + extensions
Connection structure Fat Tree
Compilers Fortran 95, C, C++
Vendors information Web page www.sgi.com/products/servers/altix/4000/
Year of introduction 2006

System parameters:

Model Altix 4700
Clock cycle 1.66 GHz
Theor. peak performance  
Per core (64-bits) 6.64 Gflop/s
Maximum (64-bits) 6.8 Tflop/s
Main memory  
Memory/maximal ≤ 512 GB
No. of processors 4–512
Communication bandwidth  
Point-to-point 3.2 GB/s
Aggregate peak/64 proc. frame 44.8 GB/s

Remarks:

The newest Altix version is the 4000 series succeeding the Altix 3700. The difference is mainly in the support of the type of Intel Itanium processors and the communication network. The Altix 4700 supports the dual-core Montecito processor with the new, faster 533 and 667 MHz frontside buses. Furthermore, where the model 3700 used NUMAlink3 for the connection of the processor boards, the Altix uses NUMAlink4 with twice the bandwidth at 3.2 GB/s, unidirectional. Also the structure of the processor boards has changed: instead of the so-called C-bricks with four Itanium 2 processors, 2 memory modules, two I/O ports, and two SHUBs (ASICs that connect processors, memory, I/O, and neighbouring processor boards), the Altix 4700 uses processor blades that houses 1 or 2 processors. SGI offers these two variants to accommodate different types of usage. The blades with 1 processor support the fastest frontside bus of 677 MHz thus giving a bandwidth of 10.7 GB/s to the processor on the blade. This processor blade is offered for bandwidth-hungry applications with irregular but massive memory access. The 2-processor blade, called the density option, uses the slower 533 MHz frontside bus for the processors and the slightly slower 1.6 GHz Montecito. The latter blade variant is assumed to satisfy a large part of the HPC users more cost-effectively.

The Altix is a ccNUMA system which means that the address space is shared between all processors (although it is physically distributed and therefore not uniformly accessible). In contrast to the Altix 3700 the bandwidth on the blades is as high as that of the off-board connections: NUMAlink4 technology is employed both on the blade and off-board.

SGI does not provide its own suite of compilers. Rather it distributes the Intel compilers for the Itanium processors. Also the operating system is Linux and not IRIX, SGI's former own Unix flavour although some additions are made to the standard Linux distributions, primarily for supporting SGI's MPI implementation and the CFXS file system.

Frames with 32 processor blades can be coupled with NUMAlink4 to form systems with a single-system image of at most 512 processors (1024 cores). So OpenMP programs with up to 1024 processes can be run. On larger configurations, because Numalink allows remote addressing, one can apart from MPI also employ the Cray-style {\tt shmem} library for one-sided communication.

Measured Performances:
In the TOP 500 list [49] a variety of Altix 4700 entries are present, the highest ranking of which is that of the system at the Leibniz Rechenzentrum in Munich. It reports a speed of 56.5 Tflop/s solving a linear system of size N = 1,583,232 with an efficiency of 91%.