Search CORE

7,054 research outputs found

Method of up-front load balancing for local memory parallel processors

Author: Baffes Paul Thomas
Publication venue
Publication date: 24/04/1990
Field of study

In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent

NASA Technical Reports Server

Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

Author: Fischer James R.
Grosch Chester
Mcanulty Michael
Odonnell John
Storey Owen
Publication venue
Publication date
Field of study

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

NASA Technical Reports Server

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

Author: D Wentzlaff
J Ross
JE Stone
M Baker
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2016
Field of study

There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture offers high computational energy efficiency for integer and floating point calculations as well as parallel scalability. The Epiphany-III is available as a coprocessor in platforms that also utilize an ARM CPU host. OpenCL provides good functionality for supporting a co-design programming model in which the host CPU offloads parallel work to a coprocessor. However, the OpenCL memory model is inconsistent with the Epiphany memory architecture and lacks support for inter-core communication. We propose a hybrid programming model in which OpenSHMEM provides a better solution by replacing the non-standard OpenCL extensions introduced to achieve high performance with the Epiphany architecture. We demonstrate the proposed programming model for matrix-matrix multiplication based on Cannon's algorithm showing that the hybrid model addresses the deficiencies of using OpenCL alone to achieve good benchmark performance.Comment: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

arXiv.org e-Print Archive

Crossref

Bit-level pipelined digit-serial array processors

Author: Aggoun A
Ashur A
Ibrahim MK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/1998
Field of study

A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented

Crossref

Brunel University Research Archive

A candidate architecture for monitoring and control in chemical transfer propulsion systems

Author: Binder Michael P.
Millis Marc G.
Publication venue
Publication date
Field of study

To support the exploration of space, a reusable space-based rocket engine must be developed. This engine must sustain superior operability and man-rated levels of reliability over several missions with limited maintenance or inspection between flights. To meet these requirements, an expander cycle engine incorporating a highly capable control and health monitoring system is planned. Alternatives for the functional organization and the implementation architecture of the engine's monitoring and control system are discussed. On the basis of this discussion, a decentralized architecture is favored. The trade-offs between several implementation options are outlined and future work is proposed

NASA Technical Reports Server

Description of the PMAD DC test bed architecture and integration sequence

Author: Beach R. F.
Bolerjack B.
Fong D.
Trash L.
Publication venue
Publication date
Field of study

NASA-Lewis is responsible for the development, fabrication, and assembly of the electric power system (EPS) for the Space Station Freedom (SSF). The SSF power system is radically different from previous spacecraft power systems in both the size and complexity of the system. Unlike past spacecraft power system the SSF EPS will grow and be maintained on orbit and must be flexible to meet changing user power needs. The SSF power system is also unique in comparison with terrestrial power systems because it is dominated by power electronic converters which regulate and control the power. Although spacecraft historically have used power converters for regulation they typically involved only a single series regulating element. The SSF EPS involves multiple regulating elements, two or more in series, prior to the load. These unique system features required the construction of a testbed which would allow the development of spacecraft power system technology. A description is provided of the Power Management and Distribution (PMAD) DC Testbed which was assembled to support the design and early evaluation of the SSF EPS. A description of the integration process used in the assembly sequence is also given along with a description of the support facility

NASA Technical Reports Server

Speeding up multiprocessor machines with reconfigurable optical interconnects - art. no. 61240K

Author: ARTUNDO I
Dambre Joni
DEBAES C
DESMET L
Heirman Wim
Thienpont Hugo
Van Campenhout Jan
Publication venue: SPIE-INT SOCIETY OPTICAL ENGINEERING
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

A framework for FPGA functional units in high performance computing

Author: Koltes A.
O'Donnell J.T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2010
Field of study

FPGAs make it practical to speed up a program by defining hardware functional units that perform calculations faster than can be achieved in software. Specialised digital circuits avoid the overhead of executing sequences of instructions, and they make available the massive parallelism of the components. The FPGA operates as a coprocessor controlled by a conventional computer. An application that combines software with hardware in this way needs an interface between a communications port to the processor and the signals connected to the functional units. We present a framework that supports the design of such systems. The framework consists of a generic controller circuit defined in VHDL that can be configured by the user according to the needs of the functional units and the I/O channel. The controller contains a register file and a pipelined programmable register transfer machine, and it supports the design of both stateless and stateful functional units. Two examples are described: the implementation of a set of basic stateless arithmetic functional units, and the implementation of a stateful algorithm that exploits circuit parallelism

Enlighten

Recommended from our members

AGM, a dataflow database machine

Author: Bic Lubomir
Hartmann Robert L.
Publication venue: eScholarship, University of California
Publication date: 01/01/1984
Field of study

In recent years, a number of database machines consisting of large numbers of parallel processing elements have been proposed. Unfortunately, one of the main limitations to parallelism in database processing is the I/O bandwidth of the underlying storage devices. One way to solve this problem is to use multiple parallel disk units. The main problem with this approach, however, is the lack of a computational model capable of utilizing the potential of any significant number of such devices.This paper presents a database model which is based on the principles of data-driven computation. According to this model, the database is represented as a network in which each node is conceptually an independent processing element, capable of communicating with other nodes by exchanging messages along the network arcs. To answer a query, one or more such messages, called tokens, are created and injected into the network. These then propagate asynchronously through the network in the search of results satisfying the given query.To investigate the performance of the proposed system, we have implemented the model on a simulated computer architecture. The results of the simulation ex-periments indicate that the model is capable of exploiting the potential I/O band-width of a large number of disk units as well as the computational power of the associated processing elements

eScholarship - University of California