1,926 research outputs found
Asynchronous Queue Machines with Explicit Forwarding
http://www.youtube.com/watch?v=SJaMtBKnN-IWe consider computational models motivated by processors which exhibit architectural asynchrony and allow operands to bypass the register bank using a forwarding mechanism. We analyse the interaction between asynchrony and forwarding, derive constraints on the usage of forwarding for various models of operation, and study consequences for compilers targeting such processors.
Our approach to reasoning about processor behaviour is programming language based. We introduce an assembly language in which forwarding is explicitly visible. Operational models corresponding to processor abstractions are expressed as structural operational semantics for this language. The benefits of this approach for defining program execution and for relating processor models formally are demonstrated. Furthermore, we study the restrictions on the class of admissible programs for each operational model. Under our programming language perspective, these constraints are expressed as static semantics and formalised as type systems. Suitability of forwarding schemes for particular models of operation follows from soundness and completeness results which are established by standard programming language proof techniques. Well-typed programs are structurally correct and cannot experience run-time errors due to ill usage of the forwarding mechanism.
Exposing asynchrony and forwarding to the programmer allows a compiler to optimise forwarding behaviour by scheduling operands. We show how program analysis can decide which values to communicate through registers and which ones to forward. The analysis is expressed as a dataflow problem for an intermediate language and is proven sound with respect to a dynamic semantics. Solutions to the dataflow equations yield translations into the assembly language which are functionally faithful to the operational semantics and also structure-preserving as resulting programs are well-typed. The theoretical development of the translation is complemented by a prototypical implementation. Experimental results are included for a symbolic conversion of Java virtual machine code into the intermediate language, indicating that application programs contain sufficient opportunities for forwarding to make our approach viable.
In conclusion, we demonstrate the benefits of a programming language based view for reasoning about programs targeting architectures with asynchrony and forwarding
Recommended from our members
Investigating distributed simulation at the Ford motor company
Engine production is a complex process that requires the manufacturing and assembly of a wide variety of components to create a varied product mix. Simulation plays a key role in the planning process of a new production line to determine if it can meet expected demand. However, these simulations can be very time consuming and can often take up to a day to execute a single run. This paper investigates how distributed simulation based on the IEEE 1516 High Level Architecture and the emerging standard COTS Simulation Package Interoperability Product Development Group (CSPI-PDG) Type I Interoperability Reference Model could be used to reduce the time taken for a single simulation run. CSP interoperability and the problem of integrating CSPs with HLA software (the runtime infrastructure) are presented. New prototype benchmarking software, the COTS Simulation Package Emulator (CSPE), which is being developed to investigate distributed simulation problems, is discussed. The paper then develops a case study of how this was used to investigate the feasibility of using distributed simulation at Ford. The paper discusses results obtained from this case study and suggests that distributed simulation could indeed be beneficial to Ford
PARALLEX FILE SYSTEM (PXFS): BRIDGING THE GAP BETWEEN EXASCALE PROCESSING CAPABILITIES AND I/O PERFORMANCE
Due to processors reaching the maximum performance allowable by current technology, architectural trends for computer systems continue to increase the number of cores per processing chip to maximize system performance. Most estimates suggest massively parallel systems will be available within the decade, containing millions of cores and capable of exaFlops of performance. New models of execution are necessary to maximize processor utilization and minimize power costs for these exascale systems. ParalleX is one such execution model, which attempts to address inefficiencies of current execution models by exposing fine-grained parallelism, increasing system utilization using asynchronous workflow, and resolving resource contention through the use of adaptive and dynamic resource scheduling. A particularly important aspect of these exascale execution models is the design of the I/O subsystem, which has seen limited performance increases compared to processor and network technologies. Parallel file systems have been designed to help alleviate the poor performance of storage technologies by distributing file data across multiple nodes of a parallel system to maximize the aggregate throughput attainable by file system clients. However, the design of parallel file systems needs to be modified to explicitly address the inherent high-latency of remote file system operations without degrading file system performance and scalability. We present modifications to OrangeFS, a high-performance, working model parallel file system geared towards the facilitation of research in the field of parallel I/O, to help address the inefficiencies of current file systems. We deem our resultant parallel file system implementation ParalleX File System (PXFS), as it attempts to support the features required by the I/O subsystem of the ParalleX execution model. Specifically, PXFS offers mechanisms for masking the latency of file system operations, defining meaningful computation to be overlapped with file system communication, and maintaining the high-performance and scalability exhibited by OrangeFS. Our results indicate PXFS successfully improves file system performance and supports the semantics of ParalleX with limited programmer intervention, potentially simplifying the design and increasing the performance of many ParalleX applications
OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF
The actor model of computation has been designed for a seamless support of
concurrency and distribution. However, it remains unspecific about data
parallel program flows, while available processing power of modern many core
hardware such as graphics processing units (GPUs) or coprocessors increases the
relevance of data parallelism for general-purpose computation.
In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework
(CAF). This offers a high level interface for accessing any OpenCL device
without leaving the actor paradigm. The new type of actor is integrated into
the runtime environment of CAF and gives rise to transparent message passing in
distributed systems on heterogeneous hardware. Following the actor logic in
CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence
operate in a multi-stage fashion on data resident at the GPU. Developers are
thus enabled to build complex data parallel programs from primitives without
leaving the actor paradigm, nor sacrificing performance. Our evaluations on
commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear
scaling behavior when offloading larger workloads. For sub-second duties, the
efficiency of offloading was found to largely differ between devices. Moreover,
our findings indicate a negligible overhead over programming with the native
OpenCL API.Comment: 28 page
Flow Java: Declarative Concurrency for Java
This thesis presents the design, implementation, and evaluation of
Flow Java, a programming language for the implementation of concurrent
programs. Flow Java adds powerful programming abstractions for
automatic synchronization of concurrent programs to Java. The
abstractions added are single assignment variables (logic variables)
and futures (read-only views of logic variables).
The added abstractions conservatively extend Java with respect to
types, parameter passing, and concurrency. Futures support secure
concurrent abstractions and are essential for seamless integration of
single assignment variables into Java. These abstractions allow for
simple and concise implementation of high-level concurrent programming
abstractions.
Flow Java is implemented as a moderate extension to the
GNU gcj/libjava Java compiler and runtime environment. The
extension is not specific to a particular implementation, it could
easily be incorporated into other Java implementations.
The thesis presents three implementation strategies for single
assignment variables. One strategy uses forwarding and dereferencing
while the two others are variants of Taylor's scheme. Taylor's scheme
represents logic variables as a circular list. The thesis presents a
new adaptation of Taylor's scheme to a concurrent language using
operating system threads.
The Flow Java system is evaluated using standard Java
benchmarks. Evaluation shows that in most cases the overhead incurred
by the extensions is between 10% and 50%. For some pathological
cases the runtime increases by up to 150%. Concurrent programs making
use of Flow Java's automatic synchronization, generally perform as
good as corresponding Java programs. In some cases Flow Java programs
outperform Java programs by as much as 33%
Verificare: a platform for composable verification with application to SDN-Enabled systems
Software-Defined Networking (SDN) has become increasing prevalent
in both the academic and industrial communities. A new class of system built on
SDNs, which we refer to as SDN-Enabled, provide programmatic interfaces between
the SDN controller and the larger distributed system. Existing tools for SDN
verification and analysis are insufficiently expressive to capture
this composition of a network and a larger distributed system. Generic
verification systems are an infeasible solution, due to their monolithic
approach to modeling and rapid state-space explosion.
In this thesis we present a new compositional approach to system modeling and
verification that is particularly appropriate for SDN-Enabled systems.
Compositional models may have sub-components (such as switches and
end-hosts) modified, added, or removed with only minimal, isolated changes.
Furthermore, invariants may be defined over the composed system that restrict
its behavior, allowing assumptions to be added or removed and for components to
be abstracted away into the service guarantee that they provide (such as
guaranteed packet arrival). Finally, compositional modeling can minimize the
size of the state space to be verified by taking advantage of known model
structure.
We also present the Verificare platform, a tool chain for building
compositional models in our modeling language and automatically compiling them
to multiple off-the-shelf verification tools. The compiler outputs a minimal,
calculus-oblivious formalism, which is accessed by plugins via a translation
API. This enables a wide variety of requirements to be
verified. As new tools become available, the translator can easily be extended
with plugins to support them
- âŠ