131,856 research outputs found

    Distributed-Memory Breadth-First Search on Massive Graphs

    Full text link
    This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    SkiMap: An Efficient Mapping Framework for Robot Navigation

    Full text link
    We present a novel mapping framework for robot navigation which features a multi-level querying system capable to obtain rapidly representations as diverse as a 3D voxel grid, a 2.5D height map and a 2D occupancy grid. These are inherently embedded into a memory and time efficient core data structure organized as a Tree of SkipLists. Compared to the well-known Octree representation, our approach exhibits a better time efficiency, thanks to its simple and highly parallelizable computational structure, and a similar memory footprint when mapping large workspaces. Peculiarly within the realm of mapping for robot navigation, our framework supports realtime erosion and re-integration of measurements upon reception of optimized poses from the sensor tracker, so as to improve continuously the accuracy of the map.Comment: Accepted by International Conference on Robotics and Automation (ICRA) 2017. This is the submitted version. The final published version may be slightly differen

    Block-Matching Optical Flow for Dynamic Vision Sensor- Algorithm and FPGA Implementation

    Full text link
    Rapid and low power computation of optical flow (OF) is potentially useful in robotics. The dynamic vision sensor (DVS) event camera produces quick and sparse output, and has high dynamic range, but conventional OF algorithms are frame-based and cannot be directly used with event-based cameras. Previous DVS OF methods do not work well with dense textured input and are designed for implementation in logic circuits. This paper proposes a new block-matching based DVS OF algorithm which is inspired by motion estimation methods used for MPEG video compression. The algorithm was implemented both in software and on FPGA. For each event, it computes the motion direction as one of 9 directions. The speed of the motion is set by the sample interval. Results show that the Average Angular Error can be improved by 30\% compared with previous methods. The OF can be calculated on FPGA with 50\,MHz clock in 0.2\,us per event (11 clock cycles), 20 times faster than a Java software implementation running on a desktop PC. Sample data is shown that the method works on scenes dominated by edges, sparse features, and dense texture.Comment: Published in ISCAS 201
    corecore