At the time of writing this report two BlueGene types of systems have become available: the BlueGene/L and the BlueGene/P, the successor of the former. Both feature processors based on the PowerPC 400 processor family.
The BlueGene/L processorThis processor is in fact a modified PowerPC 440 processor, which is made especially for the IBM BlueGene family. It runs presently at a speed of 700 MHz. The modification lies in tacking on floating-point units (FPUs)that are not part of the standard processor but can be connected to the 440's APU bus. Each FPU contains two floating-point functional units capable of performing 64-bit multiply-adds, divisions and square-roots. Consequently, the theoretical peak performance of a processor core is 2.8 Gflop/s. Figure 9. shows the embedding of two processor cores on a chip.![]() Figure 9: Block diagram of an IBM BlueGene/L processor chip.
As can be seen from the figure the L2 cache is very small: only 2 KB divided in a read and a write part. In fact it is a prefetch and store buffer for the rather large L3 cache. The bandwidth to and from the prefetch buffer is high, 16 B/cycle to the CPU and 8 B/cycle to the L2 buffer. The memory resides off-chip with a maximum size of 512 MB. The data from other nodes are transported through the L2 buffer, bypassing the L3 cache in first instance. The packaging of the 2-CPU nodes in the BlueGene/L is discussed in the section describing the BlueGene systems.
The BlueGene/P processorPresently little detail is known about the BlueGene/P processor. It is based on the PowerPC 450 at a clock frequency of 850 MHz and with similar floating-point enhancements as applied to the PPC 440 in the BlueGene/L. The BlueGene/P node will contain 4 processor cores which brings the peak speed to 13.6 Gflop/s/node. As yet not many details about memory speed and size, nor the bandwidths between the components on the chip are available: all speeds are about doubled which is necessary because the double amount of cores on the chip. Therefore the bandwidth from memory to an individual core is still 2B/cycle as in the BlueGene/L. Also the L3 size and the memory/core are doubled. The same holds for the network speed in the configuration. |