Search CORE

1,926 research outputs found

Asynchronous Queue Machines with Explicit Forwarding

Author: Beringer Lennart
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics.
Publication date: 01/07/2002
Field of study

http://www.youtube.com/watch?v=SJaMtBKnN-IWe consider computational models motivated by processors which exhibit architectural asynchrony and allow operands to bypass the register bank using a forwarding mechanism. We analyse the interaction between asynchrony and forwarding, derive constraints on the usage of forwarding for various models of operation, and study consequences for compilers targeting such processors. Our approach to reasoning about processor behaviour is programming language based. We introduce an assembly language in which forwarding is explicitly visible. Operational models corresponding to processor abstractions are expressed as structural operational semantics for this language. The benefits of this approach for defining program execution and for relating processor models formally are demonstrated. Furthermore, we study the restrictions on the class of admissible programs for each operational model. Under our programming language perspective, these constraints are expressed as static semantics and formalised as type systems. Suitability of forwarding schemes for particular models of operation follows from soundness and completeness results which are established by standard programming language proof techniques. Well-typed programs are structurally correct and cannot experience run-time errors due to ill usage of the forwarding mechanism. Exposing asynchrony and forwarding to the programmer allows a compiler to optimise forwarding behaviour by scheduling operands. We show how program analysis can decide which values to communicate through registers and which ones to forward. The analysis is expressed as a dataflow problem for an intermediate language and is proven sound with respect to a dynamic semantics. Solutions to the dataflow equations yield translations into the assembly language which are functionally faithful to the operational semantics and also structure-preserving as resulting programs are well-typed. The theoretical development of the translation is complemented by a prototypical implementation. Experimental results are included for a symbolic conversion of Java virtual machine code into the intermediate language, indicating that application programs contain sufficient opportunities for forwarding to make our approach viable. In conclusion, we demonstrate the benefits of a programming language based view for reasoning about programs targeting architectures with asynchrony and forwarding

Edinburgh Research Archive

Dynamic instruction scheduling and data forwarding in asynchronous superscalar processors

Author: Mullins Robert D.
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive

Recommended from our members

Investigating distributed simulation at the Ford motor company

Author: Behli L
Ladbrook J
Taylor S J E
Turner S J
Wang X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Engine production is a complex process that requires the manufacturing and assembly of a wide variety of components to create a varied product mix. Simulation plays a key role in the planning process of a new production line to determine if it can meet expected demand. However, these simulations can be very time consuming and can often take up to a day to execute a single run. This paper investigates how distributed simulation based on the IEEE 1516 High Level Architecture and the emerging standard COTS Simulation Package Interoperability Product Development Group (CSPI-PDG) Type I Interoperability Reference Model could be used to reduce the time taken for a single simulation run. CSP interoperability and the problem of integrating CSPs with HLA software (the runtime infrastructure) are presented. New prototype benchmarking software, the COTS Simulation Package Emulator (CSPE), which is being developed to investigate distributed simulation problems, is discussed. The paper then develops a case study of how this was used to investigate the feasibility of using distributed simulation at Ford. The paper discusses results obtained from this case study and suggests that distributed simulation could indeed be beneficial to Ford

Brunel University Research Archive

PARALLEX FILE SYSTEM (PXFS): BRIDGING THE GAP BETWEEN EXASCALE PROCESSING CAPABILITIES AND I/O PERFORMANCE

Author: Snyder Shane
Publication venue: Clemson University Libraries
Publication date: 01/05/2013
Field of study

Due to processors reaching the maximum performance allowable by current technology, architectural trends for computer systems continue to increase the number of cores per processing chip to maximize system performance. Most estimates suggest massively parallel systems will be available within the decade, containing millions of cores and capable of exaFlops of performance. New models of execution are necessary to maximize processor utilization and minimize power costs for these exascale systems. ParalleX is one such execution model, which attempts to address inefficiencies of current execution models by exposing fine-grained parallelism, increasing system utilization using asynchronous workflow, and resolving resource contention through the use of adaptive and dynamic resource scheduling. A particularly important aspect of these exascale execution models is the design of the I/O subsystem, which has seen limited performance increases compared to processor and network technologies. Parallel file systems have been designed to help alleviate the poor performance of storage technologies by distributing file data across multiple nodes of a parallel system to maximize the aggregate throughput attainable by file system clients. However, the design of parallel file systems needs to be modified to explicitly address the inherent high-latency of remote file system operations without degrading file system performance and scalability. We present modifications to OrangeFS, a high-performance, working model parallel file system geared towards the facilitation of research in the field of parallel I/O, to help address the inefficiencies of current file systems. We deem our resultant parallel file system implementation ParalleX File System (PXFS), as it attempts to support the features required by the I/O subsystem of the ParalleX execution model. Specifically, PXFS offers mechanisms for masking the latency of file system operations, defining meaningful computation to be overlapped with file system communication, and maintaining the high-performance and scalability exhibited by OrangeFS. Our results indicate PXFS successfully improves file system performance and supports the semantics of ParalleX with limited programmer intervention, potentially simplifying the design and increasing the performance of many ParalleX applications

Clemson University: TigerPrints

OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

Author: A Klöckner
D Charousset
G Agha
G Agha
J Nickolls
JD Owens
K Wu
L Dagum
S Srinivasan
S Wienke
T Desell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

arXiv.org e-Print Archive

Crossref

REPOSIT

Flow Java: Declarative Concurrency for Java

Author: Drejhammar Frej
Publication venue
Publication date: 01/01/2003
Field of study

This thesis presents the design, implementation, and evaluation of Flow Java, a programming language for the implementation of concurrent programs. Flow Java adds powerful programming abstractions for automatic synchronization of concurrent programs to Java. The abstractions added are single assignment variables (logic variables) and futures (read-only views of logic variables). The added abstractions conservatively extend Java with respect to types, parameter passing, and concurrency. Futures support secure concurrent abstractions and are essential for seamless integration of single assignment variables into Java. These abstractions allow for simple and concise implementation of high-level concurrent programming abstractions. Flow Java is implemented as a moderate extension to the GNU gcj/libjava Java compiler and runtime environment. The extension is not specific to a particular implementation, it could easily be incorporated into other Java implementations. The thesis presents three implementation strategies for single assignment variables. One strategy uses forwarding and dereferencing while the two others are variants of Taylor's scheme. Taylor's scheme represents logic variables as a circular list. The thesis presents a new adaptation of Taylor's scheme to a concurrent language using operating system threads. The Flow Java system is evaluated using standard Java benchmarks. Evaluation shows that in most cases the overhead incurred by the extensions is between 10% and 50%. For some pathological cases the runtime increases by up to 150%. Concurrent programs making use of Flow Java's automatic synchronization, generally perform as good as corresponding Java programs. In some cases Flow Java programs outperform Java programs by as much as 33%

Publikationer från KTH

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Verificare: a platform for composable verification with application to SDN-Enabled systems

Author: Skowyra Richard William
Publication venue
Publication date: 22/01/2016
Field of study

Software-Defined Networking (SDN) has become increasing prevalent in both the academic and industrial communities. A new class of system built on SDNs, which we refer to as SDN-Enabled, provide programmatic interfaces between the SDN controller and the larger distributed system. Existing tools for SDN verification and analysis are insufficiently expressive to capture this composition of a network and a larger distributed system. Generic verification systems are an infeasible solution, due to their monolithic approach to modeling and rapid state-space explosion. In this thesis we present a new compositional approach to system modeling and verification that is particularly appropriate for SDN-Enabled systems. Compositional models may have sub-components (such as switches and end-hosts) modified, added, or removed with only minimal, isolated changes. Furthermore, invariants may be defined over the composed system that restrict its behavior, allowing assumptions to be added or removed and for components to be abstracted away into the service guarantee that they provide (such as guaranteed packet arrival). Finally, compositional modeling can minimize the size of the state space to be verified by taking advantage of known model structure. We also present the Verificare platform, a tool chain for building compositional models in our modeling language and automatically compiling them to multiple off-the-shelf verification tools. The compiler outputs a minimal, calculus-oblivious formalism, which is accessed by plugins via a translation API. This enables a wide variety of requirements to be verified. As new tools become available, the translator can easily be extended with plugins to support them

Boston University Institutional Repository (OpenBU)