555 research outputs found
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements
Cellular Automata Applications in Shortest Path Problem
Cellular Automata (CAs) are computational models that can capture the
essential features of systems in which global behavior emerges from the
collective effect of simple components, which interact locally. During the last
decades, CAs have been extensively used for mimicking several natural processes
and systems to find fine solutions in many complex hard to solve computer
science and engineering problems. Among them, the shortest path problem is one
of the most pronounced and highly studied problems that scientists have been
trying to tackle by using a plethora of methodologies and even unconventional
approaches. The proposed solutions are mainly justified by their ability to
provide a correct solution in a better time complexity than the renowned
Dijkstra's algorithm. Although there is a wide variety regarding the
algorithmic complexity of the algorithms suggested, spanning from simplistic
graph traversal algorithms to complex nature inspired and bio-mimicking
algorithms, in this chapter we focus on the successful application of CAs to
shortest path problem as found in various diverse disciplines like computer
science, swarm robotics, computer networks, decision science and biomimicking
of biological organisms' behaviour. In particular, an introduction on the first
CA-based algorithm tackling the shortest path problem is provided in detail.
After the short presentation of shortest path algorithms arriving from the
relaxization of the CAs principles, the application of the CA-based shortest
path definition on the coordinated motion of swarm robotics is also introduced.
Moreover, the CA based application of shortest path finding in computer
networks is presented in brief. Finally, a CA that models exactly the behavior
of a biological organism, namely the Physarum's behavior, finding the
minimum-length path between two points in a labyrinth is given.Comment: To appear in the book: Adamatzky, A (Ed.) Shortest path solvers. From
software to wetware. Springer, 201
An Overlay Architecture for Pattern Matching
Deterministic and Non-deterministic Finite Automata (DFA and NFA) comprise the fundamental unit of work for many emerging big data applications, motivating recent efforts to develop Domain-Specific Architectures (DSAs) to exploit fine-grain parallelism available in automata workloads.
This dissertation presents NAPOLY (Non-Deterministic Automata Processor Over- LaY), an overlay architecture and associated software that attempt to maximally exploit on-chip memory parallelism for NFA evaluation. In order to avoid an upper bound in NFA size that commonly affects prior efforts, NAPOLY is optimized for runtime reconfiguration, allowing for full reconfiguration in 10s of microseconds. NAPOLY is also parameterizable, allowing for offline generation of repertoire of overlay configurations with various trade-offs between state capacity and transition capacity.
In this dissertation, we evaluate NAPOLY on automata applications packaged in ANMLZoo benchmarks using our proposed state mapping heuristic and off-shelf SAT solver. We compare NAPOLY’s performance against existing CPU and GPU implementations. The results show NAPOLY performs best for larger benchmarks with more active states and high report frequency. NAPOLY outperforms in 10 out of 12 benchmark suite to the best of state-of-the-art CPU and GPU implementations. To the best of our knowledge, this is the first example of a runtime-reprogrammable FPGA-based automata processor overlay
Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture
We introduce Stardust, a compiler that compiles sparse tensor algebra to
reconfigurable dataflow architectures (RDAs). Stardust introduces new
user-provided data representation and scheduling language constructs for
mapping to resource-constrained accelerated architectures. Stardust uses the
information provided by these constructs to determine on-chip memory placement
and to lower to the Capstan RDA through a parallel-patterns rewrite system that
targets the Spatial programming model. The Stardust compiler is implemented as
a new compilation path inside the TACO open-source system. Using cycle-accurate
simulation, we demonstrate that Stardust can generate more Capstan tensor
operations than its authors had implemented and that it results in 138
better performance than generated CPU kernels and 41 better performance
than generated GPU kernels.Comment: 15 pages, 13 figures, 6 tables
The IPS fidelity scale as a guideline to implement Supported Employment
info:eu-repo/semantics/publishe
- …