4,137 research outputs found

    Cooperative high-performance computing with FPGAs - matrix multiply case-study

    Get PDF
    In high-performance computing, there is great opportunity for systems that use FPGAs to handle communication while also performing computation on data in transit in an ``altruistic'' manner--that is, using resources for computation that might otherwise be used for communication, and in a way that improves overall system performance and efficiency. We provide a specific definition of \textbf{Computing in the Network} that captures this opportunity. We then outline some overall requirements and guidelines for cooperative computing that include this ability, and make suggestions for specific computing capabilities to be added to the networking hardware in a system. We then explore some algorithms running on a network so equipped for a few specific computing tasks: dense matrix multiplication, sparse matrix transposition and sparse matrix multiplication. In the first instance we give limits of problem size and estimates of performance that should be attainable with present-day FPGA hardware

    GraphStep: A System Architecture for Sparse-Graph Algorithms

    Get PDF
    Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this “memory wall,” we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreading activation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor

    Operating System Concepts for Reconfigurable Computing: Review and Survey

    Get PDF
    One of the key future challenges for reconfigurable computing is to enable higher design productivity and a more easy way to use reconfigurable computing systems for users that are unfamiliar with the underlying concepts. One way of doing this is to provide standardization and abstraction, usually supported and enforced by an operating system. This article gives historical review and a summary on ideas and key concepts to include reconfigurable computing aspects in operating systems. The article also presents an overview on published and available operating systems targeting the area of reconfigurable computing. The purpose of this article is to identify and summarize common patterns among those systems that can be seen as de facto standard. Furthermore, open problems, not covered by these already available systems, are identified
    • …
    corecore