1,227 research outputs found

    LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic [Extended Version]

    Get PDF
    CORRECTION: The authors for entry [4] in the references should have been "E. S. Chung, J. C. Hoe, and K. Mai".Developers accelerating applications on FPGAs or other reconfigurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Software developers expect a programming environment to include automatic memory management. Virtual memory provides the illusion of very large arrays and processor caches reduce access latency without explicit programmer instructions. LEAP scratchpads for reconfigurable logic dynamically allocate and manage multiple, independent, memory arrays in a large backing store. Scratchpad accesses are cached automatically in multiple levels, ranging from shared on-board, RAM-based, set-associative caches to private caches stored in FPGA RAM blocks. In the LEAP framework, scratchpads share the same interface as on-die RAM blocks and are plug-in replacements. Additional libraries support heap management within a storage set. Like software developers, accelerator authors using scratchpads may focus more on core algorithms and less on memory management. Two uses of FPGA scratchpads are analyzed: buffer management in an H.264 decoder and memory management within a processor microarchitecture timing model

    Use of synchronous concurrent algorithms in the development of safety related software.

    Get PDF
    This thesis investigates the use of Synchronous Concurrent Algorithms (SCAs) in the development of safety related software, where a stricter adherence to mathematical correctness is required. The original model of SCAs is extended to produce abstract and concrete dynamic SCAs (dSCAs) that allow dynamic, but predictable, SCAs to be produced whose wiring maybe different at different values of a program counter. A relaxed implementation of the Generalised Railroad Crossing Problem is used to demonstrate each of the SCA models. SCAs were originally defined by Tucker and Thompson and were restricted to unit-delays between modules. Hobley investigated the introduction of non-unit delay SCAs and how non-unit delay SCAs may be represented as unit delay SCAs. Poole, Tucker and Thompson introduced the concept of hierarchies of Spatially Expanded Systems, of which SCAs are a form. All of these tools are used and expanded upon in this thesis to provide a mechanism enabling an SCA representation of an algorithm to be transformed into an SCA representation of a computing device that implements that algorithm, and to be able to demonstrate correctness. As each SCA model can be represented algebraically, this thesis provides the transformations as meta-algebras, i.e. algebras that can transfrom one algebra to another algebra

    Neural Networks and Dynamic Complex Systems

    Get PDF
    We describe the use of neural networks for optimization and inference associated with a variety of complex systems. We show how a string formalism can be used for parallel computer decomposition, message routing and sequential optimizing compilers. We extend these ideas to a general treatment of spatial assessment and distributed artificial intelligence

    Symmetric and Asymmetric Asynchronous Interaction

    Get PDF
    We investigate classes of systems based on different interaction patterns with the aim of achieving distributability. As our system model we use Petri nets. In Petri nets, an inherent concept of simultaneity is built in, since when a transition has more than one preplace, it can be crucial that tokens are removed instantaneously. When modelling a system which is intended to be implemented in a distributed way by a Petri net, this built-in concept of synchronous interaction may be problematic. To investigate this we consider asynchronous implementations of nets, in which removing tokens from places can no longer be considered as instantaneous. We model this by inserting silent (unobservable) transitions between transitions and some of their preplaces. We investigate three such implementations, differing in the selection of preplaces of a transition from which the removal of a token is considered time consuming, and the possibility of collecting the tokens in a given order. We investigate the effect of these different transformations of instantaneous interaction into asynchronous interaction patterns by comparing the behaviours of nets before and after insertion of the silent transitions. We exhibit for which classes of Petri nets we obtain equivalent behaviour with respect to failures equivalence. It turns out that the resulting hierarchy of Petri net classes can be described by semi-structural properties. For two of the classes we obtain precise characterisations; for the remaining class we obtain lower and upper bounds. We briefly comment on possible applications of our results to Message Sequence Charts.Comment: 27 pages. An extended abstract of this paper was presented at the first Interaction and Concurrency Experience (ICE'08) on Synchronous and Asynchronous Interactions in Concurrent Distributed Systems, and will appear in Electronic Notes in Theoretical Computer Science, Elsevie

    Algorithmic patterns for H\mathcal{H}-matrices on many-core processors

    Get PDF
    In this work, we consider the reformulation of hierarchical (H\mathcal{H}) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs). H\mathcal{H} matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of H\mathcal{H} matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing H\mathcal{H} matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full H\mathcal{H} matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source H\mathcal{H} matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard H\mathcal{H} matrix library, highlighting profound speedups of our many-core parallel approach

    On Synchronous and Asynchronous Interaction in Distributed Systems

    Full text link
    When considering distributed systems, it is a central issue how to deal with interactions between components. In this paper, we investigate the paradigms of synchronous and asynchronous interaction in the context of distributed systems. We investigate to what extent or under which conditions synchronous interaction is a valid concept for specification and implementation of such systems. We choose Petri nets as our system model and consider different notions of distribution by associating locations to elements of nets. First, we investigate the concept of simultaneity which is inherent in the semantics of Petri nets when transitions have multiple input places. We assume that tokens may only be taken instantaneously by transitions on the same location. We exhibit a hierarchy of `asynchronous' Petri net classes by different assumptions on possible distributions. Alternatively, we assume that the synchronisations specified in a Petri net are crucial system properties. Hence transitions and their preplaces may no longer placed on separate locations. We then answer the question which systems may be implemented in a distributed way without restricting concurrency, assuming that locations are inherently sequential. It turns out that in both settings we find semi-structural properties of Petri nets describing exactly the problematic situations for interactions in distributed systems.Comment: 26 pages. An extended abstract of this paper appeared in Proceedings 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2008), Torun, Poland, August 2008 (E. Ochmanski & J. Tyszkiewicz, eds.), LNCS 5162, Springer, 2008, pp. 16-3
    • …
    corecore