11,753 research outputs found
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
Neuromorphic computing systems comprise networks of neurons that use
asynchronous events for both computation and communication. This type of
representation offers several advantages in terms of bandwidth and power
consumption in neuromorphic electronic systems. However, managing the traffic
of asynchronous events in large scale systems is a daunting task, both in terms
of circuit complexity and memory requirements. Here we present a novel routing
methodology that employs both hierarchical and mesh routing strategies and
combines heterogeneous memory structures for minimizing both memory
requirements and latency, while maximizing programming flexibility to support a
wide range of event-based neural network architectures, through parameter
configuration. We validated the proposed scheme in a prototype multi-core
neuromorphic processor chip that employs hybrid analog/digital circuits for
emulating synapse and neuron dynamics together with asynchronous digital
circuits for managing the address-event traffic. We present a theoretical
analysis of the proposed connectivity scheme, describe the methods and circuits
used to implement such scheme, and characterize the prototype chip. Finally, we
demonstrate the use of the neuromorphic processor with a convolutional neural
network for the real-time classification of visual symbols being flashed to a
dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
Simple Load Balancing for Distributed Hash Tables
Distributed hash tables have recently become a useful building block for a variety of distributed applications. However, current schemes based upon consistent hashing require both considerable implementation complexity and substantial storage overhead to achieve desired load balancing goals. We argue in this paper that these goals can b e achieved more simply and more cost-effectively. First, we suggest the direct application of the "power of two choices" paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally b e extended to support other load balancing methods, including load-stealing or load-shedding schemes, as well as providing natural fault-tolerance mechanisms
Modeling Data-Plane Power Consumption of Future Internet Architectures
With current efforts to design Future Internet Architectures (FIAs), the
evaluation and comparison of different proposals is an interesting research
challenge. Previously, metrics such as bandwidth or latency have commonly been
used to compare FIAs to IP networks. We suggest the use of power consumption as
a metric to compare FIAs. While low power consumption is an important goal in
its own right (as lower energy use translates to smaller environmental impact
as well as lower operating costs), power consumption can also serve as a proxy
for other metrics such as bandwidth and processor load.
Lacking power consumption statistics about either commodity FIA routers or
widely deployed FIA testbeds, we propose models for power consumption of FIA
routers. Based on our models, we simulate scenarios for measuring power
consumption of content delivery in different FIAs. Specifically, we address two
questions: 1) which of the proposed FIA candidates achieves the lowest energy
footprint; and 2) which set of design choices yields a power-efficient network
architecture? Although the lack of real-world data makes numerous assumptions
necessary for our analysis, we explore the uncertainty of our calculations
through sensitivity analysis of input parameters
Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems
We present Swallow, a scalable many-core architecture, with a current
configuration of 480 x 32-bit processors.
Swallow is an open-source architecture, designed from the ground up to
deliver scalable increases in usable computational power to allow
experimentation with many-core applications and the operating systems that
support them.
Scalability is enabled by the creation of a tile-able system with a
low-latency interconnect, featuring an attractive communication-to-computation
ratio and the use of a distributed memory configuration.
We analyse the energy and computational and communication performances of
Swallow. The system provides 240GIPS with each core consuming 71--193mW,
dependent on workload. Power consumption per instruction is lower than almost
all systems of comparable scale.
We also show how the use of a distributed operating system (nOS) allows the
easy creation of scalable software to exploit Swallow's potential. Finally, we
show two use case studies: modelling neurons and the overlay of shared memory
on a distributed memory system.Comment: An open source release of the Swallow system design and code will
follow and references to these will be added at a later dat
A Multifunctional Processing Board for the Fast Track Trigger of the H1 Experiment
The electron-proton collider HERA is being upgraded to provide higher
luminosity from the end of the year 2001. In order to enhance the selectivity
on exclusive processes a Fast Track Trigger (FTT) with high momentum resolution
is being built for the H1 Collaboration. The FTT will perform a 3-dimensional
reconstruction of curved tracks in a magnetic field of 1.1 Tesla down to 100
MeV in transverse momentum. It is able to reconstruct up to 48 tracks within 23
mus in a high track multiplicity environment. The FTT consists of two hardware
levels L1, L2 and a third software level. Analog signals of 450 wires are
digitized at the first level stage followed by a quick lookup of valid track
segment patterns.
For the main processing tasks at the second level such as linking, fitting
and deciding, a multifunctional processing board has been developed by the ETH
Zurich in collaboration with Supercomputing Systems (Zurich). It integrates a
high-density FPGA (Altera APEX 20K600E) and four floating point DSPs (Texas
Instruments TMS320C6701). This presentation will mainly concentrate on second
trigger level hardware aspects and on the implementation of the algorithms used
for linking and fitting. Emphasis is especially put on the integrated CAM
(content addressable memory) functionality of the FPGA, which is ideally suited
for implementing fast search tasks like track segment linking.Comment: 6 pages, 4 figures, submitted to TN
- …