1,227 research outputs found
LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic [Extended Version]
CORRECTION: The authors for entry [4] in the references should have been "E. S. Chung,
J. C. Hoe, and K. Mai".Developers accelerating applications on FPGAs or other reconfigurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Software developers expect a programming environment to include automatic memory management. Virtual memory provides the illusion of very large arrays and processor caches reduce access latency without explicit programmer instructions. LEAP scratchpads for reconfigurable logic dynamically allocate and manage multiple, independent, memory arrays in a large backing store. Scratchpad accesses are cached automatically in multiple levels, ranging from shared on-board, RAM-based, set-associative caches to private caches stored in FPGA RAM blocks. In the LEAP framework, scratchpads share the same interface as on-die RAM blocks and are plug-in replacements. Additional libraries support heap management within a storage set. Like software developers, accelerator authors using scratchpads may focus more on core algorithms and less on memory management. Two uses of FPGA scratchpads are analyzed: buffer management in an H.264 decoder and memory management within a processor microarchitecture timing model
Use of synchronous concurrent algorithms in the development of safety related software.
This thesis investigates the use of Synchronous Concurrent Algorithms (SCAs) in the development of safety related software, where a stricter adherence to mathematical correctness is required. The original model of SCAs is extended to produce abstract and concrete dynamic SCAs (dSCAs) that allow dynamic, but predictable, SCAs to be produced whose wiring maybe different at different values of a program counter. A relaxed implementation of the Generalised Railroad Crossing Problem is used to demonstrate each of the SCA models. SCAs were originally defined by Tucker and Thompson and were restricted to unit-delays between modules. Hobley investigated the introduction of non-unit delay SCAs and how non-unit delay SCAs may be represented as unit delay SCAs. Poole, Tucker and Thompson introduced the concept of hierarchies of Spatially Expanded Systems, of which SCAs are a form. All of these tools are used and expanded upon in this thesis to provide a mechanism enabling an SCA representation of an algorithm to be transformed into an SCA representation of a computing device that implements that algorithm, and to be able to demonstrate correctness. As each SCA model can be represented algebraically, this thesis provides the transformations as meta-algebras, i.e. algebras that can transfrom one algebra to another algebra
Neural Networks and Dynamic Complex Systems
We describe the use of neural networks for optimization and inference associated with a variety of complex systems. We show how a string formalism can be used for parallel computer decomposition, message routing and sequential optimizing compilers. We extend these ideas to a general treatment of spatial assessment and distributed artificial intelligence
Symmetric and Asymmetric Asynchronous Interaction
We investigate classes of systems based on different interaction patterns
with the aim of achieving distributability. As our system model we use Petri
nets. In Petri nets, an inherent concept of simultaneity is built in, since
when a transition has more than one preplace, it can be crucial that tokens are
removed instantaneously. When modelling a system which is intended to be
implemented in a distributed way by a Petri net, this built-in concept of
synchronous interaction may be problematic. To investigate this we consider
asynchronous implementations of nets, in which removing tokens from places can
no longer be considered as instantaneous. We model this by inserting silent
(unobservable) transitions between transitions and some of their preplaces. We
investigate three such implementations, differing in the selection of preplaces
of a transition from which the removal of a token is considered time consuming,
and the possibility of collecting the tokens in a given order.
We investigate the effect of these different transformations of instantaneous
interaction into asynchronous interaction patterns by comparing the behaviours
of nets before and after insertion of the silent transitions. We exhibit for
which classes of Petri nets we obtain equivalent behaviour with respect to
failures equivalence.
It turns out that the resulting hierarchy of Petri net classes can be
described by semi-structural properties. For two of the classes we obtain
precise characterisations; for the remaining class we obtain lower and upper
bounds.
We briefly comment on possible applications of our results to Message
Sequence Charts.Comment: 27 pages. An extended abstract of this paper was presented at the
first Interaction and Concurrency Experience (ICE'08) on Synchronous and
Asynchronous Interactions in Concurrent Distributed Systems, and will appear
in Electronic Notes in Theoretical Computer Science, Elsevie
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
On Synchronous and Asynchronous Interaction in Distributed Systems
When considering distributed systems, it is a central issue how to deal with
interactions between components. In this paper, we investigate the paradigms of
synchronous and asynchronous interaction in the context of distributed systems.
We investigate to what extent or under which conditions synchronous interaction
is a valid concept for specification and implementation of such systems. We
choose Petri nets as our system model and consider different notions of
distribution by associating locations to elements of nets. First, we
investigate the concept of simultaneity which is inherent in the semantics of
Petri nets when transitions have multiple input places. We assume that tokens
may only be taken instantaneously by transitions on the same location. We
exhibit a hierarchy of `asynchronous' Petri net classes by different
assumptions on possible distributions. Alternatively, we assume that the
synchronisations specified in a Petri net are crucial system properties. Hence
transitions and their preplaces may no longer placed on separate locations. We
then answer the question which systems may be implemented in a distributed way
without restricting concurrency, assuming that locations are inherently
sequential. It turns out that in both settings we find semi-structural
properties of Petri nets describing exactly the problematic situations for
interactions in distributed systems.Comment: 26 pages. An extended abstract of this paper appeared in Proceedings
33rd International Symposium on Mathematical Foundations of Computer Science
(MFCS 2008), Torun, Poland, August 2008 (E. Ochmanski & J. Tyszkiewicz,
eds.), LNCS 5162, Springer, 2008, pp. 16-3
- …