68,577 research outputs found

    Supporting task creation inside FPGA devices

    Get PDF
    The most common model to use co-processors/accelerators is the master-slave model where the slaves (coprocessors/ accelerators) are driven by a general purpose cpu. This simplifies the management of the accelerators because they cannot actively interact with the runtime and they are just passive slaves that operate over the memory under demand. However, the master-slave model limits system possibilities and introduces synchronization overheads that could be avoided. To overcome those limitations and increase the possibilities of accelerators, we propose extending task based programming models (like OpenMP [1] or OmpSs) to support some runtime APIs inside the FPGA co-processor. As a proof-of-concept, we implemented our proposal over the OmpSs@FPGA environment [2] adding the needed infrastructure in the FPGA bitstream and modifying the existing tools to support creation of children tasks inside a task offloaded to an FPGA accelerator. In addition, we added support to synchronize the children tasks created by a FPGA task regardless they are executed in a SMP host thread or they also target another FPGA accelerator in the same co-processor

    A Practical Hierarchial Model of Parallel Computation: The Model

    Get PDF
    We introduce a model of parallel computation that retains the ideal properties of the PRAM by using it as a sub-model, while simultaneously being more reflective of realistic parallel architectures by accounting for and providing abstract control over communication and synchronization costs. The Hierarchical PRAM (H-PRAM) model controls conceptual complexity in the face of asynchrony in two ways. First, by providing the simplifying assumption of synchronization to the design of algorithms, but allowing the algorithms to work asynchronously with each other; and organizing this control asynchrony via an implicit hierarchy relation. Second, by allowing the restriction of communication asynchrony in order to obtain determinate algorithms (thus greatly simplifying proofs of correctness). It is shown that the model is reflective of a variety of existing and proposed parallel architectures, particularly ones that can support massive parallelism. Relationships to programming languages are discussed. Since the PRAM is a sub-model, we can use PRAM algorithms as sub-algorithms in algorithms for the H-PRAM; thus results that have been established with respect to the PRAM are potentially transferable to this new model. The H-PRAM can be used as a flexible tool to investigate general degrees of locality (“neighborhoods of activity) in problems, considering communication and synchronization simultaneously. This gives the potential of obtaining algorithms that map more efficiently to architectures, and of increasing the number of processors that can efficiently be used on a problem (in comparison to a PRAM that charges for communication and synchronization). The model presents a framework in which to study the extent that general locality can be exploited in parallel computing. A companion paper demonstrates the usage of the H-PRAM via the design and analysis of various algorithms for computing the complete binary tree and the FFT/butterfly graph

    Transparent multi-core speculative parallelization of DES models with event and cross-state dependencies

    Get PDF
    In this article we tackle transparent parallelization of Discrete Event Simulation (DES) models to be run on top of multi-core machines according to speculative schemes. The innovation in our proposal lies in that we consider a more general programming and execution model, compared to the one targeted by state of the art PDES platforms, where the boundaries of the state portion accessible while processing an event at a specific simulation object do not limit access to the actual object state, or to shared global variables. Rather, the simulation object is allowed to access (and alter) the state of any other object, thus causing what we term cross-state dependency. We note that this model exactly complies with typical (easy to manage) sequential-style DES programming, where a (dynamically-allocated) state portion of object A can be accessed by object B in either read or write mode (or both) by, e.g., passing a pointer to B as the payload of a scheduled simulation event. However, while read/write memory accesses performed in the sequential run are always guaranteed to observe (and to give rise to) a consistent snapshot of the state of the simulation model, consistency is not automatically guaranteed in case of parallelization and concurrent execution of simulation objects with cross-state dependencies. We cope with such a consistency issue, and its application-transparent support, in the context of parallel and optimistic executions. This is achieved by introducing an advanced memory management architecture, able to efficiently detect read/write accesses by concurrent objects to whichever object state in an application transparent manner, together with advanced synchronization mechanisms providing the advantage of exploiting parallelism in the underlying multi-core architecture while transparently handling both cross-state and traditional event-based dependencies. Our proposal targets Linux and has been integrated with the ROOT-Sim open source optimistic simulation platform, although its design principles, and most parts of the developed software, are of general relevance. Copyright 2014 ACM

    An Object-Oriented Model for Extensible Concurrent Systems: the Composition-Filters Approach

    Get PDF
    Applying the object-oriented paradigm for the development of large and complex software systems offers several advantages, of which increased extensibility and reusability are the most prominent ones. The object-oriented model is also quite suitable for modeling concurrent systems. However, it appears that extensibility and reusability of concurrent applications is far from trivial. The problems that arise, the so-called inheritance anomalies are analyzed and presented in this paper. A set of requirements for extensible concurrent languages is formulated. As a solution to the identified problems, an extension to the object-oriented model is presented; composition filters. Composition filters capture messages and can express certain constraints and operations on these messages, for example buffering. In this paper we explain the composition filters approach, demonstrate its expressive power through a number of examples and show that composition filters do not suffer from the inheritance anomalies and fulfill the requirements that were established

    Composing Software from Multiple Concerns: A Model and Composition Anomalies

    Get PDF
    Constructing software from components is considered to be a key requirement for managing the complexity of software. Separation of concerns makes only sense if the realizations of these concerns can be composed together effectively into a working program. Various publications have shown that composability of software is far from trivial and fails when components express complex behavior such as constraints, synchronization and history-sensitiveness. We believe that to address the composability problems, we need to understand and define the situations where composition fails. To this aim, in this paper we (a) introduce a general model of multi-dimensional concern composition, and (b) define so-called composition anomalies

    Deterministic Consistency: A Programming Model for Shared Memory Parallelism

    Full text link
    The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code, and runtime systems can impose synthetic schedules on legacy parallel code. To parallelize existing serial code, however, we would like a programming model that is naturally deterministic without language restrictions or artificial scheduling. We propose "deterministic consistency", a parallel programming model as easy to understand as the "parallel assignment" construct in sequential languages such as Perl and JavaScript, where concurrent threads always read their inputs before writing shared outputs. DC supports common data- and task-parallel synchronization abstractions such as fork/join and barriers, as well as non-hierarchical structures such as producer/consumer pipelines and futures. A preliminary prototype suggests that software-only implementations of DC can run applications written for popular parallel environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure

    Jeeg: Temporal Constraints for the Synchronization of Concurrent Objects

    No full text
    We introduce Jeeg, a dialect of Java based on a declarative replacement of the synchronization mechanisms of Java that results in a complete decoupling of the 'business' and the 'synchronization' code of classes. Synchronization constraints in Jeeg are expressed in a linear temporal logic which allows to effectively limit the occurrence of the inheritance anomaly that commonly affects concurrent object oriented languages. Jeeg is inspired by the current trend in aspect oriented languages. In a Jeeg program the sequential and concurrent aspects of object behaviors are decoupled: specified separately by the programmer these are then weaved together by the Jeeg compiler
    • 

    corecore