719 research outputs found
Hyperswitch communication network
The Hyperswitch Communication Network (HCN) is a large scale parallel computer prototype being developed at JPL. Commercial versions of the HCN computer are planned. The HCN computer being designed is a message passing multiple instruction multiple data (MIMD) computer, and offers many advantages in price-performance ratio, reliability and availability, and manufacturing over traditional uniprocessors and bus based multiprocessors. The design of the HCN operating system is a uniquely flexible environment that combines both parallel processing and distributed processing. This programming paradigm can achieve a balance among the following competing factors: performance in processing and communications, user friendliness, and fault tolerance. The prototype is being designed to accommodate a maximum of 64 state of the art microprocessors. The HCN is classified as a distributed supercomputer. The HCN system is described, and the performance/cost analysis and other competing factors within the system design are reviewed
Optimal cube-connected cube multiprocessors
Many CFD (computational fluid dynamics) and other scientific applications can be partitioned into subproblems. However, in general the partitioned subproblems are very large. They demand high performance computing power themselves, and the solutions of the subproblems have to be combined at each time step. The cube-connect cube (CCCube) architecture is studied. The CCCube architecture is an extended hypercube structure with each node represented as a cube. It requires fewer physical links between nodes than the hypercube, and provides the same communication support as the hypercube does on many applications. The reduced physical links can be used to enhance the bandwidth of the remaining links and, therefore, enhance the overall performance. The concept and the method to obtain optimal CCCubes, which are the CCCubes with a minimum number of links under a given total number of nodes, are proposed. The superiority of optimal CCCubes over standard hypercubes was also shown in terms of the link usage in the embedding of a binomial tree. A useful computation structure based on a semi-binomial tree for divide-and-conquer type of parallel algorithms was identified. It was shown that this structure can be implemented in optimal CCCubes without performance degradation compared with regular hypercubes. The result presented should provide a useful approach to design of scientific parallel computers
Modeling of Topologies of Interconnection Networks based on Multidimensional Multiplicity
Modern SoCs are becoming more complex with the integration of heterogeneous components (IPs). For this purpose, a high performance interconnection medium is required to handle the complexity. Hence NoCs come into play enabling the integration of more IPs into the SoC with increased performance. These NoCs are based on the concept of Interconnection networks used to connect parallel machines. In response to the MARTE RFP of the OMG, a notation of multidimensional multiplicity has been proposed which permits to model repetitive structures and topologies. This report presents a modeling methodology based on this notation that can be used to model a family of Interconnection Networks called Delta Networks which in turn can be used for the construction of NoCs
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures
Near-Data-Processing (NDP) architectures present a promising way to alleviate
data movement costs and can provide significant performance and energy benefits
to parallel applications. Typically, NDP architectures support several NDP
units, each including multiple simple cores placed close to memory. To fully
leverage the benefits of NDP and achieve high performance for parallel
workloads, efficient synchronization among the NDP cores of a system is
necessary. However, supporting synchronization in many NDP systems is
challenging because they lack shared caches and hardware cache coherence
support, which are commonly used for synchronization in multicore systems, and
communication across different NDP units can be expensive.
This paper comprehensively examines the synchronization problem in NDP
systems, and proposes SynCron, an end-to-end synchronization solution for NDP
systems. SynCron adds low-cost hardware support near memory for synchronization
acceleration, and avoids the need for hardware cache coherence support. SynCron
has three components: 1) a specialized cache memory structure to avoid memory
accesses for synchronization and minimize latency overheads, 2) a hierarchical
message-passing communication protocol to minimize expensive communication
across NDP units of the system, and 3) a hardware-only overflow management
scheme to avoid performance degradation when hardware resources for
synchronization tracking are exceeded.
We evaluate SynCron using a variety of parallel workloads, covering various
contention scenarios. SynCron improves performance by 1.27 on average
(up to 1.78) under high-contention scenarios, and by 1.35 on
average (up to 2.29) under low-contention real applications, compared
to state-of-the-art approaches. SynCron reduces system energy consumption by
2.08 on average (up to 4.25).Comment: To appear in the 27th IEEE International Symposium on
High-Performance Computer Architecture (HPCA-27
- …