36 research outputs found
Development of a Quantum Chemical Two-Electron Integral Program for a Hierarchical Distributed Shared Memory Multiprocessor System (MEMSY)
A quantum mechanical integral program has been implemented on
a multiprocessor system with a hierarchical architecture, having at
the same time a global memory and a locally distributed memory.
Due to this hardware concept the possibilities of communication
are manifold and therefore more complexin comparison with other
multiprocessor systems, e.g. Intel iPSC/860 or workstation clusters.
Nevertheless, the efficiencyobtained using asimulator or the real
system are of comparable quality. It is expected that this variety
of interprocessor communications can be employed to its full extent
in the second part of the program in which hermitian eigenvalue
problems have to be solved many times
Algorithm-Based Fault-tolerant Programming in Scientific Computation on Multiprocessors
Efficient parallel algorithms proposed to solve many fundamental problems in scientific computation are sensitive to processor failures. Because of its low costs, algorithm-based fault tolerance is an interesting concept for introducing fault tolerance into existing multiprocessors. To facilitate fault-tolerant programming in scientific computation, we have modified and developed further an existing parallel run-time environment. In this paper the aspect of tuning known error processing techniques to the algorithm-based approach is primarily examined. Design issues for implementation and execution time overhead of a fault-tolerant application in our run-time environment are studied. In contrast to many other environments for parallel fault-tolerant programming, which use the master/slave programming model, our environment enables one to add fault tolerance to existing parallel applications in scientific computatio
Recommended from our members
State-of-the-art review of computational fluid dynamics modeling for fluid-solids systems
As the result of 15 years of research (50 staff years of effort) Argonne National Laboratory (ANL), through its involvement in fluidized-bed combustion, magnetohydrodynamics, and a variety of environmental programs, has produced extensive computational fluid dynamics (CFD) software and models to predict the multiphase hydrodynamic and reactive behavior of fluid-solids motions and interactions in complex fluidized-bed reactors (FBRS) and slurry systems. This has resulted in the FLUFIX, IRF, and SLUFIX computer programs. These programs are based on fluid-solids hydrodynamic models and can predict information important to the designer of atmospheric or pressurized bubbling and circulating FBR, fluid catalytic cracking (FCC) and slurry units to guarantee optimum efficiency with minimum release of pollutants into the environment. This latter issue will become of paramount importance with the enactment of the Clean Air Act Amendment (CAAA) of 1995. Solids motion is also the key to understanding erosion processes. Erosion rates in FBRs and pneumatic and slurry components are computed by ANL`s EROSION code to predict the potential metal wastage of FBR walls, intervals, feed distributors, and cyclones. Only the FLUFIX and IRF codes will be reviewed in the paper together with highlights of the validations because of length limitations. It is envisioned that one day, these codes with user-friendly pre and post-processor software and tailored for massively parallel multiprocessor shared memory computational platforms will be used by industry and researchers to assist in reducing and/or eliminating the environmental and economic barriers which limit full consideration of coal, shale and biomass as energy sources, to retain energy security, and to remediate waste and ecological problems
A shared memory multi-microprocessor system with hardware supported message passing mechanisms.
by Lam Chin Hung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1990.Bibliography: leaves 167-174.ABSTRACT --- p.1ACKNOWLEDGEMENTS --- p.2TABLE OF CONTENTS --- p.3Chapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Gaining performance with multiprocessing --- p.1Chapter 1.1.1 --- Software approach --- p.2Chapter 1.1.2 --- hardware approach --- p.2Chapter 1.2 --- Parallel processing --- p.4Chapter 1.3 --- Gaining performance with multiprocessing --- p.7Chapter 1.3.1 --- Multiprocessor configurations --- p.7Chapter 1.3.2 --- Multiprocessor design issues --- p.9Chapter 1.3.3 --- Using microprocessors --- p.11Chapter 1.3.4 --- Bus based systems --- p.12Chapter 1.4 --- Shared memory and message passing --- p.13Chapter 1.4.1 --- Shared memory --- p.13Chapter 1.4.2 --- Message passing --- p.14Chapter 1.4.3 --- Comparisons of the two paradigms --- p.16Chapter 1.5 --- Summary and comment --- p.19Chapter CHAPTER 2 --- AN OVERVIEW OF COMMON APPROACHES --- p.20Chapter 2.1 --- SUPRENUM --- p.20Chapter 2.2 --- MEMSY --- p.22Chapter 2.3 --- ELXSI --- p.24Chapter 2.4 --- Sequent --- p.25Chapter 2.5 --- YACKOS --- p.26Chapter 2.6 --- Summary --- p.30Chapter CHAPTER 3 --- THE MPC APPROACH --- p.32Chapter 3.1 --- A shared memory multiprocessor architecture --- p.32Chapter 3.2 --- Message passer for inter-process communication --- p.32Chapter 3.2.1 --- A review of the message passer approach --- p.33Chapter 3.2.2 --- Pit-falls of the message passer approach --- p.34Chapter 3.3 --- The role of the MPC --- p.35Chapter 3.3.1 --- The quest for the MPC --- p.35Chapter 3.3.2 --- Duties of the MPC --- p.37Chapter 3.3.2.1 --- Software aspects --- p.37Chapter 3.3.2.2 --- Hardware aspects --- p.40Chapter 3.4 --- Advantages and disadvantages --- p.41Chapter 3.4.1 --- Advantages --- p.41Chapter 3.4.2 --- Disadvantages --- p.43Chapter 3.4.3 --- Other discussions --- p.44Chapter 3.5 --- Summary --- p.44Chapter CHAPTER 4 --- THE DESIGN OF SM3 --- p.46Chapter 4.1 --- Introduction to SM3 --- p.45Chapter 4.2 --- Software aspects --- p.47Chapter 4.2.1 --- Programming model --- p.48Chapter 4.2.1.1 --- Logical entities --- p.48Chapter 4.2.1.2 --- Communication procedure --- p.48Chapter 4.2.2 --- Message structure --- p.51Chapter 4.2.2.1 --- Broadcast versus point-to-point messages --- p.52Chapter 4.2.2.2 --- Message priority --- p.52Chapter 4.2.2.3 --- Blocking versus non-blocking --- p.53Chapter 4.3 --- Hardware aspects --- p.55Chapter 4.3.1 --- Overall architecture --- p.55Chapter 4.3.2 --- The host machineChapter 4.3.3 --- Slave processor nodes --- p.57Chapter 4.3.4 --- The MPC --- p.59Chapter 4.4 --- Communication protocols --- p.60Chapter 4.4.1 --- Short and long messages --- p.60Chapter 4.4.2 --- Point-to-point messages --- p.61Chapter 4.4.3 --- 1-to-N DMA for broadcast messages --- p.63Chapter 4.4.3.1 --- Introducing 1-to-N DMA --- p.63Chapter 4.4.3.2 --- 1-to-N DMA operation --- p.64Chapter 4.4.3.3 --- Merits and demerits of 1-to-N DMA --- p.67Chapter 4.5 --- Summary --- p.68Chapter CHAPTER 5 --- IMPLEMENTATION ISSUES OF SM3 --- p.70Chapter 5.1 --- The shared bus - VMEbus --- p.70Chapter 5.1.1 --- Why VMEbus --- p.70Chapter 5.1.2 --- Customizing the VMEbus --- p.71Chapter 5.2 --- The host machine --- p.71Chapter 5.3 --- Slave processor nodes --- p.72Chapter 5.3.1 --- Overview of a PN --- p.74Chapter 5.3.2 --- The MC68030 microprocessor --- p.77Chapter 5.3.3 --- The DMAC M68442 --- p.78Chapter 5.3.4 --- Registers --- p.79Chapter 5.3.5 --- Shared-bus interface --- p.80Chapter 5.3.6 --- Communication logic --- p.80Chapter 5.4 --- The MPC --- p.80Chapter 5.4.1 --- Overview of the MPC --- p.81Chapter 5.4.2 --- Registers --- p.81Chapter 5.4.3 --- Communication logic --- p.83Chapter 5.5 --- Protocol implementation --- p.84Chapter 5.5.1 --- Point-to-point messages --- p.84Chapter 5.5.2 --- Broadcast messages --- p.86Chapter 5.5.2.1 --- Circular buffer queue --- p.87Chapter 5.5.2.2 --- Participating entities --- p.87Chapter 5.5.2.3 --- Protocol details --- p.88Chapter 5.6 --- System start-up procedure --- p.94Chapter 5.6.1 --- Power up reset of PNs --- p.94Chapter 5.6.2 --- Initialization of the processor pool --- p.95Chapter 5.7 --- Summary --- p.95Chapter CHAPTER 6 --- APPLICATION EXAMPLES --- p.96Chapter 6.1 --- Introduction --- p.96Chapter 6.2 --- Matrix Multiplication --- p.96Chapter 6.3 --- Parallel Quicksort --- p.97Chapter 6.4 --- Pipeline Problems --- p.99Chapter CHAPTER 7 --- UNSOLVED PROBLEMS AND FUTURE DEVELOPMENT --- p.101Chapter 7.1 --- Current Status --- p.101Chapter 7.2 --- Possible immediate enhancements --- p.102Chapter 7.2.1 --- Enhancement to the PNs --- p.102Chapter 7.2.2 --- Enhancement of the MPC --- p.103Chapter 7.2.3 --- Communication kernel enhancement --- p.103Chapter 7.3 --- Limitation of a shared bus --- p.104Chapter 7.4 --- Number crunching capability --- p.105Chapter 7.5 --- Parallel programming environment --- p.105Chapter 7.5.1 --- Conform to serial language --- p.105Chapter 7.5.2 --- Moving to parallel programming languages --- p.106Chapter 7.5.2.1 --- Uni-processor Unix --- p.107Chapter 7.5.2.2 --- Porting Unix --- p.108Chapter 7.5.2.3 --- Multiprocessor Unix --- p.108Chapter 7.5.3 --- Object-oriented approach --- p.110Chapter 7.6 --- Summary --- p.112Chapter CHAPTER 8 --- CONCLUSION --- p.113Chapter 8.1 --- Thesis summary --- p.113Chapter 8.2 --- Author's comment --- p.114Chapter 8.3 --- Looking into the future --- p.116Chapter APPENDIX A --- BLOCK DIAGRAM --- p.117Chapter APPENDIX B --- CIRCUIT DIAGRAMS --- p.119Chapter APPENDIX C --- PCB LAYOUT --- p.126Chapter APPENDIX D --- VMEBUS ADDRESS MAP --- p.132Chapter APPENDIX E --- PROCESSOR NODE ADDRESS MAP --- p.133Chapter APPENDIX F --- REGISTER LAYOUT --- p.134Chapter F.1 --- Registers on a PN --- p.134Chapter F.2 --- Registers on the MPC --- p.134Chapter APPENDIX G --- PAL DESIGN --- p.136Chapter APPENDIX H --- COMMUNICATION SUB-BUS --- p.146Chapter H.1 --- Signal definition --- p.146Chapter H.2 --- Pin assignment --- p.146Chapter APPENDIX I --- FEASIBILITY OF TASK DISTRIBUTION PLAN --- p.147Chapter APPENDIX J --- COMMUNICATION PRIMITIVES --- p.148Chapter APPENDIX K --- PHOTOGRAPHS OF SM3 --- p.150Chapter APPENDIX L --- PROTOCOL STATE DIAGRAMS --- p.152Chapter L.1 --- Predefined partial state diagrams --- p.152Chapter L.2 --- Point-to-point messages --- p.152Chapter L.3 --- Broadcast messages --- p.154Chapter APPENDIX M --- BOOT-UP PROCEDURE OF SM3 --- p.159PUBLICATIONS --- p.161REFERENCES --- p.16
Recommended from our members
Mapping numerical software onto distributed memory parallel systems
The aim of this thesis is to further the use of parallel computers, in particular distributed memory systems, by proving strategies for parallelisation and developing the core component of tools to aid scalar software porting. The ported code must not only efficiently exploit available parallel processing speed and distributed memory, but also enable existing users of the scalar code to use the parallel version with identical inputs and allow maintenance to be performed by the scalar code author in conjunction with the parallel code.
The data partition strategy has been used to parallelise an in-house solidification modelling code where all requirements for the parallel software were successfully met. To confirm the success of this parallelisation strategy, a much sterner test was used, parallelising the HARWELL-FLOW3D fluid flow package. The performance results of the parallel version clearly vindicate the conclusions of the first example. Speedup efficiencies of around 80 percent have been achieved on fifty processors for sizable models. In both these tests, the alterations to the code were fairly minor, maintaining the structure and style of the original scalar code which can easily be recognised by its original author.
The alterations made to these codes indicated the potential for parallelising tools since the alterations were fairly minor and usually mechanical in nature. The current generation of parallelising compilers rely heavily on heuristic guidance in parallel code generation and other decisions that may be better made by a human. As a result, the code they produce will almost certainly be inferior to manually produced code. Also, in order not to sacrifice parallel code quality when using tools, the scalar code analysis to identify inherent parallelism in a application code, as used in parallelising compilers, has been extended to eliminate dependencies conservatively assumed, since these dependencies can greatly inhibit parallelisation.
Extra information has been extracted both from control flow and from processing symbolic information. The tests devised to utilise this information enable the non-existence of a significant number of previously assumed dependencies to be proved. In some cases, the number of true dependencies has been more than halved.
The dependence graph produced is of sufficient quality to greatly aid the parallelisation, with user interaction and interpretation, parallelism detection and code transformation validity being less inhibited by assumed dependencies. The use of tools rather than the black box approach removes the handicaps associated with using heuristic methods, if any relevant heuristic methods exist
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also