11 research outputs found
Parallel processing for scientific computations
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system
Innovation in the Wireless Ecosystem: A Customer-Centric Framework
The Federal Communications Commission’s Notice of Inquiry in GN 09-157 Fostering Innovation and Investment in the Wireless Communications Market is a significant event at an opportune moment. Wireless communications has already radically changed the way not only Americans but people the world over communicate with each other and access and share information, and there appears no end in sight to this fundamental shift in communication markets. Although the wireless communications phenomenon is global, the US has played and will continue to play a major role in the shaping of this market. At the start of a new US Administration and important changes in the FCC, it is most appropriate that this proceeding be launched.
The AURORA Gigabit Testbed
AURORA is one of five U.S. networking testbeds charged with exploring applications of, and technologies necessary for, networks operating at gigabit per second or higher bandwidths. The emphasis of the AURORA testbed, distinct from the other four testbeds, BLANCA, CASA, NECTAR, and VISTANET, is research into the supporting technologies for gigabit networking.
Like the other testbeds, AURORA itself is an experiment in collaboration, where government initiative (in the form of the Corporation for National Research Initiatives, which is funded by DARPA and the National Science Foundation) has spurred interaction among pre-existing centers of excellence in industry, academia, and government.
AURORA has been charged with research into networking technologies that will underpin future high-speed networks. This paper provides an overview of the goals and methodologies employed in AURORA, and points to some preliminary results from our first year of research, ranging from analytic results to experimental prototype hardware. This paper enunciates our targets, which include new software architectures, network abstractions, and hardware technologies, as well as applications for our work
A formal actor-based model for streaming the future
Asynchronous Actor-based programming has gained increasing attention as a model of concurrency and distribution. The Abstract Behavioral Specification (ABS) language is an actor-based programming language that has been developed for both the modeling and formal analysis of distributed systems. In ABS, actors are modeled as concurrent objects that communicate by asynchronous method calls. Return values are also communicated asynchronously via return statements and so-called futures. Many modern distributed software
Asynchronous programming in the abstract behavioural specification language
Chip manufacturers are rapidly moving towards so-called manycore chips with thousands of independent processors on the same silicon real estate. Current programming languages can only leverage the potential power by inserting code with low level concurrency constructs, sacrificing clarity. Alternatively, a programming language can integrate a thread of execution with a stable notion of identity, e.g., in active objects.Abstract Behavioural Specification (ABS) is a language for designing executable models of parallel and distributed object-oriented systems based on active objects, and is defined in terms of a formal operational semantics which enables a variety of static and dynamic analysis techniques for the ABS models.The overall goal of this thesis is to extend the asynchronous programming model and the corresponding analysis techniques in ABS.Algorithms and the Foundations of Software technolog
Stateful data-parallel processing
Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial—it brings domain-specific knowledge from broad fields—but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs—with explicit state that they can update, such as matrices in machine learning algorithms—and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems.
Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state—spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.Open Acces
Asynchronous Programming in the Abstract Behavioural Specification Language
Chip manufacturers are rapidly moving towards so-called manycore chips with thousands of independent processors on the same silicon real estate. Current programming languages can only leverage the potential power by inserting code with low level concurrency constructs, sacrificing clarity. Alternatively, a programming language can integrate a thread of execution with a stable notion of identity, e.g., in active objects.Abstract Behavioural Specification (ABS) is a language for designing executable models of parallel and distributed object-oriented systems based on active objects, and is defined in terms of a formal operational semantics which enables a variety of static and dynamic analysis techniques for the ABS models.The overall goal of this thesis is to extend the asynchronous programming model and the corresponding analysis techniques in ABS.Algorithms and the Foundations of Software technolog
Fault-tolerant parallel applications using a network of workstations
PhD thesisIt is becoming common to employ a Network Of Workstations, often referred to as a NOW, for
general purpose computing since the allocation of an individual workstation offers good interactive
response. However, there may still be a need to perform very large scale computations which exceed
the resources of a single workstation. It may be that the amount of processing implies an inconveniently
long duration or that the data manipulated exceeds available storage. One possibility is to employ a
more powerful single machine for such computations. However, there is growing interest in seeking a
cheaper alternative by harnessing the significant idle time often observed in a NOW and also possibly
employing a number of workstations in parallel on a single problem. Parallelisation permits use of the
combined memories of all participating workstations, but also introduces a need for communication. and
success in any hardware environment depends on the amount of communication relative to the amount
of computation required. In the context of a NOW, much success is reported with applications which
have low communication requirements relative to computation requirements.
Here it is claimed that there is reason for investigation into the use of a NOW for parallel execution
of computations which are demanding in storage, potentially even exceeding the sum of memory in
all available workstations. Another consideration is that where a computation is of sufficient scale,
some provision for tolerating partial failures may be desirable. However, generic support for storage
management and fault-tolerance in computations of this scale for a NOW is not currently available and
the suitability of a NOW for solving such computations has not been investigated to any large extent.
The work described here is concerned with these issues.
The approach employed is to make use of an existing distributed system which supports nested
atomic actions (atomic transactions) to structure fault-tolerant computations with persistent objects.
This system is used to develop a fault-tolerant "bag of tasks" computation model, where the bag and
shared objects are located on secondary storage.
In order to understand the factors that affect the performance of large parallel computations on a
NOW, a number of specific applications are developed. The performance of these applications is ana-
lysed using a semi-empirical model. The same measurements underlying these performance predictions
may be employed in estimation of the performance of alternative application structures. Using services
provided by the distributed system referred to above, each application is implemented. The implement-
ation allows verification of predicted performance and also permits identification of issues regarding
construction of components required to support the chosen application structuring technique. The work
demonstrates that a NOW certainly offers some potential for gain through parallelisation and that for
large grain computations, the cost of implementing fault tolerance is low.Engineering and Physical Sciences Research Counci
Recommended from our members
Implementing fault tolerance in a 64-bit distributed operating system
This thesis explores the potential of 64-bit processors for providing a different style of distributed operating system. Rather than providing another reworking of the UNIX model, the use of the large address space for unifying volatile memory (virtual memory), persistent memory (file systems) and distributed network access is examined and a novel operating system, Arius, is proposed.
The concepts behind the design of ARIUS are briefly reviewed, and then the reliability of such a system is examined in detail. The unified nature of the architecture makes it possible to use a reliable single address space to provide a completely reliable system without the addition of other mechanisms. Protocols are proposed to provide locally scalable distributed shared memory and these are then augmented to handle machine failures transparently though the use of distributed checkpoints and rollback.
The checkpointing system makes use of the caching mechanism in DSM to provide data duplication for failure recovery. By using distributed memory for checkpoints, recovery from machine faults may be handled seamlessly. To cope with more “complete” failures, persistent storage is also included in the failure mechanism.
These protocols are modelled to show their operability and to determine the cost they incur in various types of parallel and serial programs. Results are presented to demonstrate these costs