Systems under development

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER5+
    3. IBM BlueGene processors
    4. Intel Itanium 2
    5. Intel Xeon
    6. The MIPS processor
    7. The SPARC processors
  8. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
    4. QsNet
    5. SCI
Available systems
  1. The Bull NovaScale
  2. The C-DAC PARAM Padma
  3. The Cray X1E
  4. The Cray XT3
  5. The Cray XT4
  6. The Cray XMT
  7. The Fujitsu/Siemens M9000
  8. The Fujitsu/Siemens PRIMEQUEST
  9. The Hitachi BladeSymphony
  10. The Hitachi SR11000
  11. The HP Integrity Superdome
  12. The IBM eServer p575
  13. The IBM BlueGene/L&P
  14. The Liquid Computing LiquidIQ
  15. The NEC Express5800/1000
  16. The NEC SX-8
  17. The SGI Altix 4000
  18. The SiCortex SC series
  19. The Sun M9000
Systems disappeared from the list
Systems under development
Glossary
Acknowledgments
References

Although we mainly want to discuss real, marketable systems and no experimental, special purpose, or even speculative machines, we want to include a section on systems that are in a far stage of development and have a fair chance of reaching the market. For inclusion in section 3 we set the rule that the system described there should be on the market within a period of 6 months from announcement. The systems described in this section will in all probability appear within one year from the publication of this report. However, there are vendors who do not want to disclose any specific data on their new machines until they are actually beginning to ship them. We recognise the wishes of such vendors (it is generally wise not to stretch the expectation of potential customers too long) and they will not disclose such information.

Below we discuss systems that may lead to commercial systems to be introduced on the market between somewhat more than half a year to a year from now. The commercial systems that result from it will sometimes deviate significantly from the original research models depending on the way the development is done (the approaches in Japan and the USA differ considerably in this respect) and the user group which is targeted.

A development that has shown to be of significance is the introduction of Intel's IA-64 Itanium processor family. Six vendors are offering Itanium 2-based systems at the moment and it is known that HP has ended the marketing of its Alpha and PA-RISC based systems in favour of the Itanium processor family. Likewise SGI stopped the further development of MIPS processor based machines. The only vendor going against this trend is SiCortex that re-introduced a MIPS processor based machine. This means that the processor base for HPC systems is rather narrow. However, the shock that was caused in the USA by the advent of the Japanese Earth Simulator system has helped in refueling the funding of alternative processor and computer architecture research. Indeed, some initiatives in that direction are already under way but these will not bear real new results in the near future.

In the processor section we already noted the considerable interest generated by systems that provide acceleration by means of FPGAs or other special computational accelerators like those from ClearSpeed. For selected algorithms such accelerators can sometimes speed up the applications containing them by an order of magnitude or more. The interest has so much increased because of the much improved software interfaces to program the accelerators which makes them accessible to the large community of application program developers. Some vendors make systems built entirely around the accelerator. An example is SRC Computer Inc. that makes clusters in which each node contains a FPGA, while Cray had its XD1 of which relevant components will re-appear shortly in the heterogeneous systems it will bring out in the near future while SGI offers its RASC blades (see below). It is to be expected that at within the near future a HPC cannot afford not to include somehow such accelerators into their architectures.

Cray Inc.

In the end of 2002 the next generation vector processor, the X1, from Cray Inc. was ready to ship. It built on the technology found in the Cray SV-1s. Cray widely publicises a roadmap of future systems as far as around 2010 primarily based on the Cascade project. This is the project that has started with help of DARPA's High Productivity Computer Systems initiative (HPCS) that has as one of its goals that 10 Pflop/s systems (sustained) should be available by 2010. This should not only entail the necessary hardware but also a (possibly new) language to productively program such systems. Cascade was Cray's answer to this initiative. Together with IBM Cray has continuing support from the HPCS program (HP, SGI, and SUN, respectively have fallen out).
Cray seems reasonably on track with its Cascade project: The Black Widow system that has the same infrastructure as the XT4 but that also can house the next generation of Cray's vector processor will be available early next year. AMD quadcore processors may coexist in the same system. The plan is to extend the range of processor types over the years: XMT processors, FPGAs, etc., suiting the needs of the client. The follow-on systems bear imaginative names like "Baker" (about end 2008, begin 2009), "Granite" and "Marble", ultimately leading to a system that should be able to deliver 10 Pflop/s sustained by 2010.
Of course Cray is to some extent dependent on AMD with respect to the scalar processors. In fact, Cray was hurt somewhat by the delay of AMD's quadcore Barcelona processor but this has not led to a major slip in the scheduled plans.

IBM

IBM has been working for some years on its BlueGene systems. Over ten of the first models, the BlueGene/L, have been installed in the last few years. The BlueGene/L follow-up the BlueGene/P has just been announced and a first one will probably be delivered already this year at Argonne National Lab. Where the BlueGene/P will attain a peak speed of 3 Pflop/s, and the BlueGene/Q will have a peak speed of around 10 Pflop/s. The BlueGene systems are hardly meant for the average HPC user but rather for a few special application fields that are able to benefit from the massive parallelism that is required to apply such systems successfully.
Of course the development of the POWERx processors also will make its mark: the POWER6 processor has the usual technology-related advantages over its predecessor, and now the first POWER6 processors are turned out although they will not appear in large-scale systems before 2008. When this processor becomes generally available the clock frequency is expected to be 3.5–4.5 GHz, yielding a processor with a performance of 14–18 Gflop/s/core. Furthermore, it is a subject of research how to couple 8 cores such that a virtual vector processor with a peak speed of around 120 Gflop/s can be made. This approach is called the ViVA (Virtual Vector Architecture). It is reminiscent of Hitachi's SR8000 processors (which used POWER5 processors) or the MSP processors in the Cray X1E. This road will take some years to go and may appear in the POWER7 processor and will extend to the next generation(s) of the POWERx.
In addition, the Cell processor, developed with Sony might become a factor in HPC systems. This processor has 8 computational cores and a control processor. Although it is in first instance targeted at the gaming industry (hence Sony's interest) numerical experiments with the processor proved it to be extremely performant in this area, be it in 32-bit precision. Future generations could however be adapted to numerically intensive work (a next generation with 64-bit precision is in the planning). When it is possible to maintain similar performance characteristics it could become an important building block for HPC systems. IBM itself is actively looking into integrating Cell processors as computational accelerators in its own systems and it cooperates in building the "Roadrunner" system at Los Alamos that combines AMD Opteron-based blades with Cell processors as accelerators.
Like Cray, IBM is one of the two vendors that are still supported by the HPCS program from DARPA. Although this support is less important for IBM than for Cray, parts of the research that now is done regarding porting applications to BlueGene-type systems, the viability of the ViVA concept, and the integration of Cell processors is certainly helped by this support.

Intel-based systems

All systems that are based on the Itanium line of processors, i.e., Bull, Fujitsu, Hitachi, HP, NEC, and SGI, are critically dependent on the ability of Intel to timely deliver the Tukwila processor, which is slated for 2008. Not only the number of cores in this processor will double to four while the modest clock frequency will go up in the 2 GHz realm, most importantly, the processor finally will get rid of the front-side bus that is a serious bottleneck in the access of data from the memory. The Tukwila processor will use the Common System Interface (CSI) which presumably will provide a bandwidth of over 40 GB/s. Of course this is necessary because of the increased number of cores/chip. In addition, the CSI specification will be open like AMD has done with its HyperTransport bus in the Torrenza initiative. This means that both low-latency networks and attached computational accelerators can be connected directly at high speed. This in turn will allow vendors to diversify their products, possibly to optimise them for specific application areas similar to Cray's future plans (see above).
Furthermore the CSI will also be available for the Xeon line of processors, which would even allow for mixing them with Itanium-based systems components. In fact SGI plans to do so in its Altix systems and Hitachi already provides it in their BladeSymphony systems. Other vendors may follow this trend as it is another way of system diversification.

SGI

SGI has plans that are more or less similar to Cray's Cascade project: coupling of heterogeneous processor sets through its proprietary network, in this case a successor of the NUMAlink4 network architecture. A first step in that direction is the availability of the so-called RASC blades that can be put into the Altix 4700 infrastructure. Each RASC blade features 2 FPGAs that can be used as computational accelerators for certain algorithms in applications. A next step is mixing Itanium-based components and Xeon-based components in the same system. Once the Common System Interface is available (see above) this should be doable without excessive costs because the CSI chipset will support both the Itanium and Xeon processor variants. The idea is to further diversify the future systems, ultimately into a system with the codename "Ultraviolet". Development of such systems is quite costly and unlike Cray and IBM, SGI does not have support from DARPA's HPCS program, so it remains to be seen whether such plans will pass the stage of intentions in regard of the present difficult financial position of SGI.

SUN

Like Cray and IBM, SUN had been awarded a grant from DARPA to develop so-called high-productivity systems in DARPA's HPCS program. This year SUN has fallen out of this program and so it determined to concentrate even more on developing heavily multi-threaded processors. The second generation of the Niagara chip, the T2 is in production and is coming onto the market right now. It supports 64 threads with 8 processor cores and has a floating-point unit attached to each core. This is a large improvement in comparison of the former T1 that had only floating-point unit/chip. Still, the T2 is not geared for the HPC area. This will be reserved for Sun's Rock processor which is to come out somewhere in 2008. It has 16 processor cores in a 4×4 grid on the chip, each core supporting 2 threads. Each 4-core part shares 32 KB of L1 data cache and L1 instruction cache together with 512 KB of integrated L2 cache. The L1 and L2 caches are connected by a 4×4 crossbar. The Rock processor will be geared to HPC work. As yet, however, no details about the clock frequency is available so it is hard to estimate what the impact of the Rock processor will be. SUN plans to produce two server variants: "Peble", a one-socket version and "Boulder" that may have 2, 4, or 8 sockets.
For the mainstream market SUN continues to rely on the UltraSPARC64 developments from Fujitsu-Siemens as present in Fujitsu-Siemens and SUN M9000 systems.