4,137 research outputs found
Cooperative high-performance computing with FPGAs - matrix multiply case-study
In high-performance computing, there is great opportunity for systems
that use FPGAs to handle communication while also performing
computation on data in transit in an ``altruistic'' manner--that is,
using resources for computation that might otherwise be used for
communication, and in a way that improves overall system performance
and efficiency. We provide a specific definition of \textbf{Computing
in the Network} that captures this opportunity. We then outline some
overall requirements and guidelines for cooperative computing that
include this ability, and make suggestions for specific computing
capabilities to be added to the networking hardware in a system. We
then explore some algorithms running on a network so equipped
for a few specific computing tasks: dense matrix multiplication,
sparse matrix transposition and sparse matrix multiplication. In the
first instance we give limits of problem size and estimates of
performance that should be attainable with present-day FPGA hardware
GraphStep: A System Architecture for Sparse-Graph Algorithms
Many important applications are organized around
long-lived, irregular sparse graphs (e.g., data and knowledge
bases, CAD optimization, numerical problems, simulations). The
graph structures are large, and the applications need regular
access to a large, data-dependent portion of the graph for each
operation (e.g., the algorithm may need to walk the graph, visiting
all nodes, or propagate changes through many nodes in the
graph). On conventional microprocessors, the graph structures
exceed on-chip cache capacities, making main-memory bandwidth
and latency the key performance limiters. To avoid this
“memory wall,” we introduce a concurrent system architecture
for sparse graph algorithms that places graph nodes in small
distributed memories paired with specialized graph processing
nodes interconnected by a lightweight network. This gives us a
scalable way to map these applications so that they can exploit
the high-bandwidth and low-latency capabilities of embedded
memories (e.g., FPGA Block RAMs). On typical spreading activation
queries on the ConceptNet Knowledge Base, a sample
application, this translates into an order of magnitude speedup
per FPGA compared to a state-of-the-art Pentium processor
Operating System Concepts for Reconfigurable Computing: Review and Survey
One of the key future challenges for reconfigurable computing is to enable higher design productivity and a more easy way to use reconfigurable computing systems for users that are unfamiliar with the underlying concepts. One way of doing this is to provide standardization and abstraction, usually supported and enforced by an operating system. This article gives historical review and a summary on ideas and key concepts to include reconfigurable computing aspects in operating systems. The article also presents an overview on published and available operating systems targeting the area of reconfigurable computing. The purpose of this article is to identify and summarize common patterns among those systems that can be seen as de facto standard. Furthermore, open problems, not covered by these already available systems, are identified
- …