131,856 research outputs found
Distributed-Memory Breadth-First Search on Massive Graphs
This chapter studies the problem of traversing large graphs using the
breadth-first search order on distributed-memory supercomputers. We consider
both the traditional level-synchronous top-down algorithm as well as the
recently discovered direction optimizing algorithm. We analyze the performance
and scalability trade-offs in using different local data structures such as CSR
and DCSC, enabling in-node multithreading, and graph decompositions such as 1D
and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
Neuromorphic computing systems comprise networks of neurons that use
asynchronous events for both computation and communication. This type of
representation offers several advantages in terms of bandwidth and power
consumption in neuromorphic electronic systems. However, managing the traffic
of asynchronous events in large scale systems is a daunting task, both in terms
of circuit complexity and memory requirements. Here we present a novel routing
methodology that employs both hierarchical and mesh routing strategies and
combines heterogeneous memory structures for minimizing both memory
requirements and latency, while maximizing programming flexibility to support a
wide range of event-based neural network architectures, through parameter
configuration. We validated the proposed scheme in a prototype multi-core
neuromorphic processor chip that employs hybrid analog/digital circuits for
emulating synapse and neuron dynamics together with asynchronous digital
circuits for managing the address-event traffic. We present a theoretical
analysis of the proposed connectivity scheme, describe the methods and circuits
used to implement such scheme, and characterize the prototype chip. Finally, we
demonstrate the use of the neuromorphic processor with a convolutional neural
network for the real-time classification of visual symbols being flashed to a
dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
Energy-efficient acceleration of MPEG-4 compression tools
We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09
μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art
SkiMap: An Efficient Mapping Framework for Robot Navigation
We present a novel mapping framework for robot navigation which features a
multi-level querying system capable to obtain rapidly representations as
diverse as a 3D voxel grid, a 2.5D height map and a 2D occupancy grid. These
are inherently embedded into a memory and time efficient core data structure
organized as a Tree of SkipLists. Compared to the well-known Octree
representation, our approach exhibits a better time efficiency, thanks to its
simple and highly parallelizable computational structure, and a similar memory
footprint when mapping large workspaces. Peculiarly within the realm of mapping
for robot navigation, our framework supports realtime erosion and
re-integration of measurements upon reception of optimized poses from the
sensor tracker, so as to improve continuously the accuracy of the map.Comment: Accepted by International Conference on Robotics and Automation
(ICRA) 2017. This is the submitted version. The final published version may
be slightly differen
Block-Matching Optical Flow for Dynamic Vision Sensor- Algorithm and FPGA Implementation
Rapid and low power computation of optical flow (OF) is potentially useful in
robotics. The dynamic vision sensor (DVS) event camera produces quick and
sparse output, and has high dynamic range, but conventional OF algorithms are
frame-based and cannot be directly used with event-based cameras. Previous DVS
OF methods do not work well with dense textured input and are designed for
implementation in logic circuits. This paper proposes a new block-matching
based DVS OF algorithm which is inspired by motion estimation methods used for
MPEG video compression. The algorithm was implemented both in software and on
FPGA. For each event, it computes the motion direction as one of 9 directions.
The speed of the motion is set by the sample interval. Results show that the
Average Angular Error can be improved by 30\% compared with previous methods.
The OF can be calculated on FPGA with 50\,MHz clock in 0.2\,us per event (11
clock cycles), 20 times faster than a Java software implementation running on a
desktop PC. Sample data is shown that the method works on scenes dominated by
edges, sparse features, and dense texture.Comment: Published in ISCAS 201
- …