Search CORE

18,788 research outputs found

Principles for problem aggregation and assignment in medium scale multiprocessors

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior

NASA Technical Reports Server

Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs

Author: DI SANZO Pierangelo
Pellegrini Alessandro
Rughetti Diego
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM) can provide benefits on both power saving and the overall applications’ execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application’s execution. We have selected a set of self-adaptive extensions to existing STM middlewares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptivity is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all

Crossref

Archivio della Ricerca - Università di Roma 3

ART

Archivio della ricerca- Università di Roma La Sapienza

Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors

Author: Amitai Dganit
Averbuch Amir
Itzikowitz Samuel
Turkel Eli
Publication venue
Publication date
Field of study

A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values

NASA Technical Reports Server

Advanced flight control system study

Author: Klafin J. F.
Mcgough J.
Moses K.
Publication venue
Publication date
Field of study

The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed

NASA Technical Reports Server

Exploiting Fine-Grain Concurrency Analytical Insights in Superscalar Processor Design

Author: Adams George B., III
Dubey Pradeep K.
Flynn Michael J.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/08/1991
Field of study

This dissertation develops analytical models to provide insight into various design issues associated with superscalar-type processors, i.e., the processors capable of executing multiple instructions per cycle. A survey of the existing machines and literature has been completed with a proposed classification of various approaches for exploiting fine-grain concurrency. Optimization of a single pipeline is discussed based on an analytical model. The model-predicted performance curves are found to be in close proximity to published results using simulation techniques. A model is also developed for comparing different branch strategies for single-pipeline processors in terms of their effectiveness in reducing branch delay. The additional instruction fetch traffic generated by certain branch strategies is also studied and is shown to be a useful criterion for choosing between equally well performing strategies. Next, processors with multiple pipelines are modelled to study the tradeoffs associated with deeper pipelines versus multiple pipelines. The model developed can reveal the cause of performance bottleneck: insufficient resources to exploit discovered parallelism, insufficient instruction stream parallelism, or insufficient scope of concurrency detection. The cost associated with speculative (i.e., beyond basic block) execution is examined via probability distributions that characterize the inherent parallelism in the instruction stream. The throughput prediction of the analytic model is shown, using a variety of benchmarks, to be close to the measured static throughput of the compiler output, under resource and scope constraints. Further experiments provide misprediction delay estimates for these benchmarks under scope constraints, assuming beyond-basic-block, out-of-order execution and run-time scheduling. These results were derived using traces generated by the Multiflow TRACE SCHEDULING™(*) compacting C and FORTRAN 77 compilers. A simplified extension to the model to include multiprocessors is also proposed. The extended model is used to analyze combined systems, such as superpipelined multiprocessors and superscalar multiprocessors, both with shared memory. It is shown that the number of pipelines (or processors) at which the maximum throughput is obtained is increasingly sensitive to the ratio of memory access time to network access delay, as memory access time increases. Further, as a function of inter-iteration dependency distance, optimum throughput is shown to vary nonlinearly, whereas the corresponding Optimum number of processors varies linearly. The predictions from the analytical model agree with published results based on simulations. (*)TRACE SCHEDULING is a trademark of Multiflow Computer, Inc

Purdue E-Pubs

Hypercube technology

Author: Cwik Tom
Ferraro Robert D.
Liewer Paulett C.
Parker Jay W.
Patterson Jean E.
Publication venue
Publication date
Field of study

The JPL designed MARKIII hypercube supercomputer has been in application service since June 1988 and has had successful application to a broad problem set including electromagnetic scattering, discrete event simulation, plasma transport, matrix algorithms, neural network simulation, image processing, and graphics. Currently, problems that are not homogeneous are being attempted, and, through this involvement with real world applications, the software is evolving to handle the heterogeneous class problems efficiently

NASA Technical Reports Server

A parallel algorithm for the enumeration of benzenoid hydrocarbons

Author: Brinkmann G
Caporossi G
Enting I G
Enting I G
Gutman I
Guttmann A J
Ince E L
Iwan Jensen
Jensen I
Jensen I
Jensen I
Jensen I
Jensen I
Knuth D E
Vöge M
Publication venue: 'IOP Publishing'
Publication date: 01/01/2009
Field of study

We present an improved parallel algorithm for the enumeration of fixed benzenoids B_h containing h hexagonal cells. We can thus extend the enumeration of B_h from the previous best h=35 up to h=50. Analysis of the associated generating function confirms to a very high degree of certainty that

B_h \sim A \kappa^h /h

and we estimate that the growth constant

\kappa = 5.161930154(8)

and the amplitude

A=0.2808499(1)

.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref