1,089 research outputs found
FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels
Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach
Data Provenance and Management in Radio Astronomy: A Stream Computing Approach
New approaches for data provenance and data management (DPDM) are required
for mega science projects like the Square Kilometer Array, characterized by
extremely large data volume and intense data rates, therefore demanding
innovative and highly efficient computational paradigms. In this context, we
explore a stream-computing approach with the emphasis on the use of
accelerators. In particular, we make use of a new generation of high
performance stream-based parallelization middleware known as InfoSphere
Streams. Its viability for managing and ensuring interoperability and integrity
of signal processing data pipelines is demonstrated in radio astronomy. IBM
InfoSphere Streams embraces the stream-computing paradigm. It is a shift from
conventional data mining techniques (involving analysis of existing data from
databases) towards real-time analytic processing. We discuss using InfoSphere
Streams for effective DPDM in radio astronomy and propose a way in which
InfoSphere Streams can be utilized for large antennae arrays. We present a
case-study: the InfoSphere Streams implementation of an autocorrelating
spectrometer, and using this example we discuss the advantages of the
stream-computing approach and the utilization of hardware accelerators
Multi-Agent Cooperation for Particle Accelerator Control
We present practical investigations in a real industrial controls environment
for justifying theoretical DAI (Distributed Artificial Intelligence) results,
and we discuss theoretical aspects of practical investigations for
accelerator control and operation. A generalized hypothesis is introduced,
based on a unified view of control, monitoring, diagnosis, maintenance and
repair tasks leading to a general method of cooperation for expert systems
by exchanging hypotheses. This has been tested for task and result sharing
cooperation scenarios. Generalized hypotheses also allow us to treat the
repetitive diagnosis-recovery cycle as task sharing cooperation. Problems
with such a loop or even recursive calls between the different agents are
discussed
Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture
We introduce Stardust, a compiler that compiles sparse tensor algebra to
reconfigurable dataflow architectures (RDAs). Stardust introduces new
user-provided data representation and scheduling language constructs for
mapping to resource-constrained accelerated architectures. Stardust uses the
information provided by these constructs to determine on-chip memory placement
and to lower to the Capstan RDA through a parallel-patterns rewrite system that
targets the Spatial programming model. The Stardust compiler is implemented as
a new compilation path inside the TACO open-source system. Using cycle-accurate
simulation, we demonstrate that Stardust can generate more Capstan tensor
operations than its authors had implemented and that it results in 138
better performance than generated CPU kernels and 41 better performance
than generated GPU kernels.Comment: 15 pages, 13 figures, 6 tables
Mainstream parallel array programming on cell
We present the E] compiler and runtime library for the âFâ subset of
the Fortran 95 programming language. âFâ provides first-class support for arrays,
allowing E] to implicitly evaluate array expressions in parallel using the SPU coprocessors
of the Cell Broadband Engine. We present performance results from
four benchmarks that all demonstrate absolute speedups over equivalent âCâ or
Fortran versions running on the PPU host processor. A significant benefit of this
straightforward approach is that a serial implementation of any code is always
available, providing code longevity, and a familiar development paradigm
- âŠ