We discuss the hardware design choices made in our 16K-node 0.8 Teraflops
supercomputer project, a machine architecture optimized for full QCD
calculations. The efficiency of the conjugate gradient algorithm in terms of
balance of floating-point operations, memory handling and utilization, and
communication overhead is addressed. We also discuss the technological
innovations and software tools that facilitate hardware design and what
opportunities these give to the academic community.Comment: Contribution to Lattice 94. 3 pages. Latex source followed by
compressed, uuenocded postscript file of the complete pape