The CP-PACS computer with a peak speed of 300 G ops was completed in March 1996 and has started to operate. We describe the nal speci cation and the hardware implementation of the CP-PACS computer, and its performance for QCD codes. A plan of the grade-up of the computer scheduled for fall of 1996 is also given.
CP-PACS project
The CP-PACS project 1] is a ve-year project which formally started in 1992. The project currently consists of 33 members in physics and computer science as listed in Ref. 2] . We selected Hitachi Ltd. as the industrial parter through a formal bidding process soon after the start of the project, and we have been working in a close collaboration for the hardware and software development of the CP-PACS computer. The fundamental design of the computer was laid down in 1992, its details worked out in 1993, and the logical design and the physical packaging design was completed in 1994. Chip fabrication and assembling of parts started in early 1995, and the CP-PACS computer with a peak speed of 300 G ops was completed in March 1996.
Hardware implementation
A picture of CP-PACS computer is shown in Fig. 1 . The size of the computer is roughly 2m 4m 3m in height, width and depth. The oor-plan is depicted in Fig. 2 . For the major architectural characteristics of the computer I refer
The nal speci cation of the CP-PACS computer is summarized in Table 1 . The size of the second-level cache has been doubled since last year. The number for latency of data transfer represents the measured value in the remote DMA mode, which is the fastest mode for data transfer, including software and hardware overheads and averaged over transfer through x, y and z Presented at Lattice 96, St. Louis, USA. Figure 3 shows the oor plan of the CPU chip which is fabricated by 0.3 micron CMOS semiconductor technology, with the size being 1.57cm 1.57cm. The PVP-SW feature, which enables vector calculations very e ectively within the RISC architecture of CPU, is implemented with 128 oating-point registers in the green part at the lower right corner of Fig. 3 .
A silicon multichip module is depicted in Fig. 4 where the chip located at the center is the CPU Figure 1 . Outlook of the CP-PACS computer P P P P The crossbar switches in the x direction are mounted on each board connecting 8 nodes, as explained above, those in the y direction placed on a back-plane located in four cabinets (symbol P in Fig. 2 ) and those in the z direction mounted on a board which is housed in one cabinet (symbol Z in Fig. 2 ).
In the two cabinets with symbol IOU, adaptors for I/O of data to the distributed disks are installed. Raid-5 disks which are connected by SCSI-2 bus through the adaptors are set in cabinets installed a few meters apart.
Performance
We write codes for lattice QCD with Fortran 90 which includes libraries for data communication. A Fortran compiler incorporating the PVP-SW feature has been newly developed, which produces e cient object codes. The performance of the object code is typically 90 { 150 M ops per node, depending on the structure of the do-loop. The through-put of the data transfer between nodes with Fortran libraries, in the case of data of 576 Kbytes as an example, is 250 Mbytes/sec, which is to be compared with the peak throughput of 300 Mbytes/sec. After checking the fundamental performance, we have performed a test of the computer as a whole with a quenched QCD spectrum calculation with the Wilson quark action on a 64 4 lattice at = 6:0 for three hopping parameters (m =m ' 0:7, 0.5, and 0.4), for two of which there exist already previous mass spectrum calculations. Results for the e ective masses of hadrons for the smaller two hopping parameters are in good agreement with the previous results. This makes us con dent that the machine is working properly and that our codes are correct. 
