12 research outputs found
Parallel processing for scientific computations
The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience
Recommended from our members
HyperForest: A high performance multi-processor architecture for real-time intelligent systems
Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doors for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities
A multiple-bus, active backplane architecture for multiprocessor systems
This research investigates several problems associated with current multiprocessor interconnection networks, focusing primarily on general-purpose, shared-memory configurations. The project deals with all aspects of the interconnection, from the architectural level to the physical backplane. A bus-based architecture is presented as an alternative to the limitations of current schemes. This dissertation will focus on the physical layer implementation;For increased reliability, performance and scalability, a multiple-bus architecture is proposed. Each bus uses a word-serial approach to keep the total number of bus signals manageable. A source-synchronous transfer protocol allows data to be streamed at a high rate, thus increasing the pin-efficiency of the bus. The control acquisition scheme combines collision detection and priority arbitration to minimize bus access time without requiring additional signal lines. Cache coherence, message passing, and synchronization primitives are provided within the bus protocol to support multiple-processor systems;To reduce the capacitive loading on the bus, an active backplane is employed. This moves the transceiver and bus interface unit from the plug-in module down to the backplane. In addition to increasing the characteristic impedance of the bus, it also reduces the end-to-end propagation delay. Another advantage of moving the bus transceivers to the backplane is the uniform load presented to the bus, regardless of whether a slot is populated;Due to the reduction in drive current required, a custom CMOS transceiver, suitable for VLSI implementation, is used. It incorporates the collision detection circuitry required for the control acquisition scheme. Initial transceiver prototypes have been designed and fabricated in 2-[mu]m CMOS. These have been successfully tested at transfer rates in excess of 50MHz
A logical layer protocol for ActiveBus architecture
This research investigates several problems associated with current multiprocessor interconnection networks, focusing primarily on general-purpose, shared-memory configurations. The project deals with all aspects of the interconnection, from the architectural level to the physical backplane. A multiple-bus based architecture is presented as an alternative to the limitations of current schemes. This dissertation will focus on the logical layer specification;The ActiveBus--a multiple, active bus--interconnection is proposed. Multiple buses increase the bandwidth as well as reliability of the interconnection while the active backplane shows a reduced and uniform capacitive load;A logical layer protocol was designed for each bus to work independently, to achieve fault tolerance. Each bus uses a word-serial approach to keep the total number of bus signal lines manageable. A dual clocking scheme is proposed. The faster clock is used for data transfer. The other clock, refered to as sync clock, is used for arbitration and handshaking;Absence of discontinuities on the bus coupled with a source-synchronous transfer protocol allows data to be streamed at a high rate, thus increasing the pin-efficiency of the bus. The data transmission rate is limited only by clock skew. In addition, the ActiveBus interface unit and the source synchronous protocol move the synchronization penalty from the shared bus to the private buffer in the unit;The protocol uses a new arbitration scheme, termed Previous Priority First. This hybrid control acquisition scheme combines collision detection and priority arbitration to minimize bus access time without requiring additional signal lines. Collision detection provides a quick access in an unsaturated system while priority arbitration guarantees the deterministic election of the master in a saturated system. The scheme also incorporates a fairness mode to minimize starvation and bus access delay in the system;The cache coherence scheme supports both copy-back and write-through policies to reduce the overhead. MOESI protocol with snoopy caches, being the most general, is followed. Message passing and synchronization primitives are provided within the bus protocol to support multiple processor systems. These primitives attempt to minimize the traffic generated by the spin locks or the memory hot spots
Recommended from our members
Program allocation for hypercube based dataflow systems
The dataflow model of computation differs from the traditional control-flow
model of computation in that it does not utilize a program counter to sequence
instructions in a program. Instead, the execution of instructions is based solely on the
availability of their operands. Thus, an instruction is executed in a dataflow computer
when all of its operands are available. This asynchronous nature of the dataflow model of
computation allows the exploitation of fine-grain parallelism inherent in programs.
Although the dataflow model of computation exploits parallelism, the problem of
optimally allocating a program to processors belongs to the class of NP-complete
problems. Therefore, one of the major issues facing designers of dataflow
multiprocessors is the proper allocation of programs to processors.
The problem of program allocation lies in maximizing parallelism while
minimizing interprocessor communication costs. The culmination of research in the area
of program allocation has produced the proposed method called the Balanced Layered
Allocation Scheme that utilizes heuristic rules to strike a balance between computation
time and communication costs in dataflow multiprocessors. Specifically, the proposed
allocation scheme utilizes Critical Path and Longest Directed Path heuristics when
allocating instructions to processors. Simulation studies indicate that the proposed
scheme is effective in reducing the overall execution time of a program by considering
the effects of communication costs on computation times