5 research outputs found
System software for the finite element machine
The Finite Element Machine is an experimental parallel computer developed at Langley Research Center to investigate the application of concurrent processing to structural engineering analysis. This report describes system-level software which has been developed to facilitate use of the machine by applications researchers. The overall software design is outlined, and several important parallel processing issues are discussed in detail, including processor management, communication, synchronization, and input/output. Based on experience using the system, the hardware architecture and software design are critiqued, and areas for further work are suggested
Hardware Barrier Synchronization: Static Barrier MIMD (SBM)
In this paper, we give the design, and performance analysis, of a new, highly efficient, synchronization mechanism called “Static Barrier MIMD” or “SBM.” Unlike traditional barrier synchronization, the proposed barriers are designed to facilitate the use of static (compile-time) code scheduling for eliminating some synchronizations. For this reason, our barrier hardware is more general than most hardware barrier mechanisms, allowing any subset of the processors to participate in each barrier. Since code scheduling typically operates on fine-grain parallelism, it is also vital that barriers be able to execute in a small number of clock ticks. The SBM is actually only one of two new classes of barrier machines proposed to facilitate static code scheduling; the other architecture is the “Dynamic Barrier MIMD,” or “DBM,” which is described in a companion paper1. The DBM differs from the SBM in that the DBM employs more complex hardware to make the system less dependent on the precision of the static analysis and code scheduling; for example, an SBM cannot efficiently manage simultaneous execution of independent parallel programs, whereas a DBM can
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Recommended from our members
An alternative architecture for performing basic computer arithmetical operations
The arithmetic portions of almost all modern processor architectures are of very similar design. We use the term "traditional" to describe this design, the primary characteristics of which are native support for integer and floating-point number types and special disjoint instructions and hardware for each supported type. Decades of refinement have endowed this traditional arithmetic architecture with high performance, but also certain inherent limitations.
The highly-specific instruction sets and circuitry that provide optimized performance for supported number types, also make it difficult to synthesize unsupported number types and manipulate them in an efficient manner. This trait also applies when using supported number types for arbitrary ranges greater than those directly implemented by the processor.
In this thesis we present an alternative to the traditional computer arithmetic architecture, designed to address the limitations of the traditional approach while preserving most of its benefits.
Instead of the specific number representation support provided by the instructions, hardware and native data types in a traditional ALU/FPU pair, we define a single data type, the XLU digit that forms a base from which other number types may be easily derived, along with a set of instruction primitives from which basic arithmetic operations may be efficiently realized.
Our data type has a signed-digit representation, which allows algorithms for addition, subtraction and multiplication to achieve a high degree of parallelism at the primitive instruction level. The instruction primitives and algorithms are designed to hide or eliminate as much branching as possible, further increasing instruction-level independence.
We provide details of the data type, an overview of the set of instruction primitives, and a discussion of how to use those instruction primitives to perform basic arithmetic algorithms for addition, subtraction and multiplication. We also give examples for three derived number representations; integer, fixed-point and floating-point numbers.
We believe that our approach of building from a unified base provides flexibility and scalability beyond that of the traditional arithmetic architecture.
Our data type, the XLU digit, and the primitive operations to manipulate it may be implemented with modest amounts of circuitry, and this, together with the highly parallel nature of the entire design means that many XLU circuit blocks can be realized in the same silicon area as one traditional ALU/FPV pair. An ALU or FPU may only work when it has the correct type to work on, whereas we believe any and all XLUs available to the processor can be kept busy almost all of the time, achieving greater utilization of the available silicon