10,136 research outputs found
Instrumentation, performance visualization, and debugging tools for multiprocessors
The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs
PETRI NET BASED MODELING OF PARALLEL PROGRAMS EXECUTING ON DISTRIBUTED MEMORY MULTIPROCESSOR SYSTEMS
The development of parallel programs following the paradigm of communicating sequen-
tial processes to be executed on distributed memory multiprocessor systems is addressed.
The key issue in programming parallel machines today is to provide computerized tools
supporting the development of efficient parallel software, i.e. software effectively har-
nessing the power of parallel processing systems. The critical situations where a parallel
programmer needs help is in expressing a parallel algorithm in a programming language,
in getting a parallel program to work and in tuning it to get optimum performance (for
example speedup). .
We show that the Petri net formalism is higly suitable as a performance modeling
technique for asynchronous parallel systems, by introducing a model taking care of the
parallel program, parallel architecture and mapping influences on overall system perfor-
mance. PRM -net (Program-Resource- Mapping) models comprise a Petri net model of the
multiple flows of control in a parallel program, a Petri net model of the parallel hardware
and the process-to-processor mapping information into a single integrated performance
model. Automated analysis of PRM-net models addresses correctness and performance
of parallel programs mapped to parallel hardware. Questions upon the correctness of
parallel programs can be answered by investigating behavioural properties of Petri net
programs like liveness, reachability, boundedness, mutualy exclusiveness etc. Peformance
of parallel programs is usefully considered only in concern with a dedicated target hard-
ware. For this reason it is essential to integrate multiprocessor hardware characteristics
into the specification of a parallel program. The integration is done by assigning the
concurrent processes to physical processing devices and communication patterns among
parallel processes to communication media connecting processing elements yielding an in-
tegrated, Petri net based performance model. Evaluation of the integrated model applies
simulation and markovian analysis to derive expressions characterising the peformance of
the program being developed.
Synthesis and decomposition rules for hierarchical models naturally give raise to
use PRM-net models for graphical, performance oriented parallel programming, support-
ing top-down (stepwise refinement) as well as bottom-up development approaches. The
graphical representation of Petri net programs visualizes phenomena like parallelism, syn-
chronisation, communication, sequential and alternative execution. Modularity of pro-
gram blocks aids reusability, prototyping is promoted by automated code generation on
the basis of high level program specifications
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Hardware for a real-time multiprocessor simulator
The hardware for a real time multiprocessor simulator (RTMPS) developed at the NASA Lewis Research Center is described. The RTMPS is a multiple microprocessor system used to investigate the application of parallel processing concepts to real time simulation. It is designed to provide flexible data exchange paths between processors by using off the shelf microcomputer boards and minimal customized interfacing. A dedicated operator interface allows easy setup of the simulator and quick interpreting of simulation data. Simulations for the RTMPS are coded in a NASA designed real time multiprocessor language (RTMPL). This language is high level and geared to the multiprocessor environment. A real time multiprocessor operating system (RTMPOS) has also been developed that provides a user friendly operator interface. The RTMPS and supporting software are currently operational and are being evaluated at Lewis. The results of this evaluation will be used to specify the design of an optimized parallel processing system for real time simulation of dynamic systems
Optimization of Discrete-parameter Multiprocessor Systems using a Novel Ergodic Interpolation Technique
Modern multi-core systems have a large number of design parameters, most of
which are discrete-valued, and this number is likely to keep increasing as chip
complexity rises. Further, the accurate evaluation of a potential design choice
is computationally expensive because it requires detailed cycle-accurate system
simulation. If the discrete parameter space can be embedded into a larger
continuous parameter space, then continuous space techniques can, in principle,
be applied to the system optimization problem. Such continuous space techniques
often scale well with the number of parameters.
We propose a novel technique for embedding the discrete parameter space into
an extended continuous space so that continuous space techniques can be applied
to the embedded problem using cycle accurate simulation for evaluating the
objective function. This embedding is implemented using simulation-based
ergodic interpolation, which, unlike spatial interpolation, produces the
interpolated value within a single simulation run irrespective of the number of
parameters. We have implemented this interpolation scheme in a cycle-based
system simulator. In a characterization study, we observe that the interpolated
performance curves are continuous, piece-wise smooth, and have low statistical
error. We use the ergodic interpolation-based approach to solve a large
multi-core design optimization problem with 31 design parameters. Our results
indicate that continuous space optimization using ergodic interpolation-based
embedding can be a viable approach for large multi-core design optimization
problems.Comment: A short version of this paper will be published in the proceedings of
IEEE MASCOTS 2015 conferenc
Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary
The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts
Dynamic resource allocation in a hierarchical multiprocessor system: A preliminary study
An integrated system approach to dynamic resource allocation is proposed. Some of the problems in dynamic resource allocation and the relationship of these problems to system structures are examined. A general dynamic resource allocation scheme is presented. A hierarchial system architecture which dynamically maps between processor structure and programs at multiple levels of instantiations is described. Simulation experiments were conducted to study dynamic resource allocation on the proposed system. Preliminary evaluation based on simple dynamic resource allocation algorithms indicates that with the proposed system approach, the complexity of dynamic resource management could be significantly reduced while achieving reasonable effective dynamic resource allocation
- …