8,441 research outputs found
FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels
Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach
Practical Run-time Checking via Unobtrusive Property Caching
The use of annotations, referred to as assertions or contracts, to describe
program properties for which run-time tests are to be generated, has become
frequent in dynamic programing languages. However, the frameworks proposed to
support such run-time testing generally incur high time and/or space overheads
over standard program execution. We present an approach for reducing this
overhead that is based on the use of memoization to cache intermediate results
of check evaluation, avoiding repeated checking of previously verified
properties. Compared to approaches that reduce checking frequency, our proposal
has the advantage of being exhaustive (i.e., all tests are checked at all
points) while still being much more efficient than standard run-time checking.
Compared to the limited previous work on memoization, it performs the task
without requiring modifications to data structure representation or checking
code. While the approach is general and system-independent, we present it for
concreteness in the context of the Ciao run-time checking framework, which
allows us to provide an operational semantics with checks and caching. We also
report on a prototype implementation and provide some experimental results that
support that using a relatively small cache leads to significant decreases in
run-time checking overhead.Comment: 30 pages, 1 table, 170 figures; added appendix with plots; To appear
in Theory and Practice of Logic Programming (TPLP), Proceedings of ICLP 201
A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws
We report on the development of a computational framework for the parallel,
mesh-adaptive solution of systems of hyperbolic conservation laws like the
time-dependent Euler equations in compressible gas dynamics or
Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh
refinement is realized by the recursive bisection of grid blocks along each
spatial dimension, implemented numerical schemes include standard
finite-differences as well as shock-capturing central schemes, both in
connection with Runge-Kutta type integrators. Parallel execution is achieved
through a configurable hybrid of POSIX-multi-threading and MPI-distribution
with dynamic load balancing. One- two- and three-dimensional test computations
for the Euler equations have been carried out and show good parallel scaling
behavior. The Racoon framework is currently used to study the formation of
singularities in plasmas and fluids.Comment: late submissio
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to
develop a methodology to design instruction sets with concurrency support.
Therefore, we also propose a list of trade-offs that have to be investigated to
advise the design of such instruction sets.
As a first experiment, we implemented one instruction set extension for
shared memory and one for non-shared memory concurrency. From our experimental
results, we derived a list of requirements for a full-grown experimental
environment for further research
- …