252 research outputs found
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
Parallel Computers and Complex Systems
We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines
Distributed Quantum Computation Architecture Using Semiconductor Nanophotonics
In a large-scale quantum computer, the cost of communications will dominate
the performance and resource requirements, place many severe demands on the
technology, and constrain the architecture. Unfortunately, fault-tolerant
computers based entirely on photons with probabilistic gates, though equipped
with "built-in" communication, have very large resource overheads; likewise,
computers with reliable probabilistic gates between photons or quantum memories
may lack sufficient communication resources in the presence of realistic
optical losses. Here, we consider a compromise architecture, in which
semiconductor spin qubits are coupled by bright laser pulses through
nanophotonic waveguides and cavities using a combination of frequent
probabilistic and sparse determinstic entanglement mechanisms. The large
photonic resource requirements incurred by the use of probabilistic gates for
quantum communication are mitigated in part by the potential high-speed
operation of the semiconductor nanophotonic hardware. The system employs
topological cluster-state quantum error correction for achieving
fault-tolerance. Our results suggest that such an architecture/technology
combination has the potential to scale to a system capable of attacking
classically intractable computational problems.Comment: 29 pages, 7 figures; v2: heavily revised figures improve architecture
presentation, additional detail on physical parameters, a few new reference
Advanced manned space flight simulation and training: An investigation of simulation host computer system concepts
The findings of a preliminary investigation by Southwest Research Institute (SwRI) in simulation host computer concepts is presented. It is designed to aid NASA in evaluating simulation technologies for use in spaceflight training. The focus of the investigation is on the next generation of space simulation systems that will be utilized in training personnel for Space Station Freedom operations. SwRI concludes that NASA should pursue a distributed simulation host computer system architecture for the Space Station Training Facility (SSTF) rather than a centralized mainframe based arrangement. A distributed system offers many advantages and is seen by SwRI as the only architecture that will allow NASA to achieve established functional goals and operational objectives over the life of the Space Station Freedom program. Several distributed, parallel computing systems are available today that offer real-time capabilities for time critical, man-in-the-loop simulation. These systems are flexible in terms of connectivity and configurability, and are easily scaled to meet increasing demands for more computing power
Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access
Scientific applications often contain large computationally-intensive
parallel loops. Loop scheduling techniques aim to achieve load balanced
executions of such applications. For distributed-memory systems, existing
dynamic loop scheduling (DLS) libraries are typically MPI-based, and employ a
master-worker execution model to assign variably-sized chunks of loop
iterations. The master-worker execution model may adversely impact performance
due to the master-level contention. This work proposes a distributed
chunk-calculation approach that does not require the master-worker execution
scheme. Moreover, it considers the novel features in the latest MPI standards,
such as passive-target remote memory access, shared-memory window creation, and
atomic read-modify-write operations. To evaluate the proposed approach, five
well-known DLS techniques, two applications, and two heterogeneous hardware
setups have been considered. The DLS techniques implemented using the proposed
approach outperformed their counterparts implemented using the traditional
master-worker execution model
Layered architecture for quantum computing
We develop a layered quantum computer architecture, which is a systematic
framework for tackling the individual challenges of developing a quantum
computer while constructing a cohesive device design. We discuss many of the
prominent techniques for implementing circuit-model quantum computing and
introduce several new methods, with an emphasis on employing surface code
quantum error correction. In doing so, we propose a new quantum computer
architecture based on optical control of quantum dots. The timescales of
physical hardware operations and logical, error-corrected quantum gates differ
by several orders of magnitude. By dividing functionality into layers, we can
design and analyze subsystems independently, demonstrating the value of our
layered architectural approach. Using this concrete hardware platform, we
provide resource analysis for executing fault-tolerant quantum algorithms for
integer factoring and quantum simulation, finding that the quantum dot
architecture we study could solve such problems on the timescale of days.Comment: 27 pages, 20 figure
The Multicomputer Toolbox - First-Generation Scalable Libraries
First-generation scalable parallel libraries have been achieved, and are maturing, within the Multicomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms, plus an inter-architecture Makefile mechanism for building applications. We have devised C-based strategies for useful classes of distributed data structures, including distributed matrices and vectors. The underlying Zipcodemessage passing system has enabled process-grid abstractions of multicomputers, communication contexts, and process groups, all characteristics needed for building scalable libraries, and scalable application software. We describe the data-distribution-independent approach to building scalable libraries, which is needed so that applications do not unnecessarily have to redistribute data at high expense. We discuss the strategy used for implementing data-distribution mappings. We also describe high-level message-passing constructs used to achieve flexibility in transmission of data structures (Zipcode invoices). We expect Zipcode and MPI message-passing interfaces (which will incorporate many features from Zipcode, mentioned above) to co-exist in the future. We discuss progress thus far in achieving uniform interfaces for different algorithms for the same operation, which are needed to create poly-algorithms. Poly-algorithms are needed to widen the potential for scalability; uniform interfaces make simpler the testing of alternative methods with an application (whether for parallelism or for convergence, or both). We indicate that data-distribution-independent algorithms are sometimes more efficient than fixed-data-distribution counterparts, because redistribution of data can be avoided, and that this question is strongly application dependent
Compilation techniques for multicomputers
This thesis considers problems in process and data partitioning when compiling
programs for distributed-memory parallel computers (or multicomputers). These
partitions may be specified by the user through the use of language constructs,
or automatically determined by the compiler.
Data and process partitioning techniques are developed for two models of
compilation. The first compilation model focusses on the loop nests present in a
serial program. Executing the iterations of these loop nests in parallel accounts for
a significant amount of the parallelism which can be exploited in these programs.
The parallelism is exploited by applying a set of transformations to the loop
nests. The iterations of the transformed loop nests are in a form which can be
readily distributed amongst the processors of a multicomputer. The manner in
which the arrays, referenced within these loop nests, are partitioned between the
processors is determined by the distribution of the loop iterations. The second
compilation model is based on the data parallel paradigm, in which operations
are applied to many different data items collectively. High Performance Fortran
is used as an example of this paradigm.
Novel collective communication routines are developed, and are applied to
provide the communication associated with the data partitions for both compilation
models. Furthermore, it is shown that by using these routines the
communication associated with partitioning data on a multicomputer is greatly
simplified. These routines are developed as part of this thesis.
The experimental context for this thesis is the development of a compiler for
the Fujitsu AP1000 multicomputer. A prototype compiler is presented. Experimental
results for a variety of applications are included
- …