6,595 research outputs found
A burst compression and expansion technique for variable-rate users in satellite-switched TDMA networks
A burst compression and expansion technique is described for asynchronously interconnecting variable-data-rate users with cost-efficient ground terminals in a satellite-switched, time-division-multiple-access (SS/TDMA) network. Compression and expansion buffers in each ground terminal convert between lower rate, asynchronous, continuous-user data streams and higher-rate TDMA bursts synchronized with the satellite-switched timing. The technique described uses a first-in, first-out (FIFO) memory approach which enables the use of inexpensive clock sources by both the users and the ground terminals and obviates the need for elaborate user clock synchronization processes. A continous range of data rates from kilobits per second to that approaching the modulator burst rate (hundreds of megabits per second) can be accommodated. The technique was developed for use in the NASA Lewis Research Center System Integration, Test, and Evaluation (SITE) facility. Some key features of the technique have also been implemented in the gound terminals developed at NASA Lewis for use in on-orbit evaluation of the Advanced Communications Technology Satellite (ACTS) high burst rate (HBR) system
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
Neuromorphic computing systems comprise networks of neurons that use
asynchronous events for both computation and communication. This type of
representation offers several advantages in terms of bandwidth and power
consumption in neuromorphic electronic systems. However, managing the traffic
of asynchronous events in large scale systems is a daunting task, both in terms
of circuit complexity and memory requirements. Here we present a novel routing
methodology that employs both hierarchical and mesh routing strategies and
combines heterogeneous memory structures for minimizing both memory
requirements and latency, while maximizing programming flexibility to support a
wide range of event-based neural network architectures, through parameter
configuration. We validated the proposed scheme in a prototype multi-core
neuromorphic processor chip that employs hybrid analog/digital circuits for
emulating synapse and neuron dynamics together with asynchronous digital
circuits for managing the address-event traffic. We present a theoretical
analysis of the proposed connectivity scheme, describe the methods and circuits
used to implement such scheme, and characterize the prototype chip. Finally, we
demonstrate the use of the neuromorphic processor with a convolutional neural
network for the real-time classification of visual symbols being flashed to a
dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
MGSim - Simulation tools for multi-core processor architectures
MGSim is an open source discrete event simulator for on-chip hardware
components, developed at the University of Amsterdam. It is intended to be a
research and teaching vehicle to study the fine-grained hardware/software
interactions on many-core and hardware multithreaded processors. It includes
support for core models with different instruction sets, a configurable
multi-core interconnect, multiple configurable cache and memory models, a
dedicated I/O subsystem, and comprehensive monitoring and interaction
facilities. The default model configuration shipped with MGSim implements
Microgrids, a many-core architecture with hardware concurrency management.
MGSim is furthermore written mostly in C++ and uses object classes to represent
chip components. It is optimized for architecture models that can be described
as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table
Neural Network Memory Architectures for Autonomous Robot Navigation
This paper highlights the significance of including memory structures in
neural networks when the latter are used to learn perception-action loops for
autonomous robot navigation. Traditional navigation approaches rely on global
maps of the environment to overcome cul-de-sacs and plan feasible motions. Yet,
maintaining an accurate global map may be challenging in real-world settings. A
possible way to mitigate this limitation is to use learning techniques that
forgo hand-engineered map representations and infer appropriate control
responses directly from sensed information. An important but unexplored aspect
of such approaches is the effect of memory on their performance. This work is a
first thorough study of memory structures for deep-neural-network-based robot
navigation, and offers novel tools to train such networks from supervision and
quantify their ability to generalize to unseen scenarios. We analyze the
separation and generalization abilities of feedforward, long short-term memory,
and differentiable neural computer networks. We introduce a new method to
evaluate the generalization ability by estimating the VC-dimension of networks
with a final linear readout layer. We validate that the VC estimates are good
predictors of actual test performance. The reported method can be applied to
deep learning problems beyond robotics
Asynchronous techniques for system-on-chip design
SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed
Quasi-perfect FIFO: Synchronous or asynchronous with application in controller design for the UNICON laser memory
The first-in-first-out memory buffer (FIFO), is an elastic digital memory whose main application is in data buffering between devices operating at different rates. Data written into the top is moved autonomously down toward the bottom of the FIFO to the lowest unoccupied location, and data read from the bottom of the FIFO will cause data from the top to move autonomously down toward the bottom. The FIFO is available in MOS LSI asynchronous form with data rate in the 1 MHz region. The FIFO described yields a simple high-speed iterative implementation, either synchronous of asynchronous. Because of this simple iterative structure, the FIFO is expandable in both number of words and bits per word, and it is attractive from the viewpoint of integrated-circuit production. For the synchronous FIFO, a model was built and successfully used in the controller for the UNICON laser memory. For the asynchronous FIFO, a model was built and also successfully used in a high-performance magnetic tape controller
Shared versus distributed memory multiprocessors
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors
A Practical Hierarchial Model of Parallel Computation: The Model
We introduce a model of parallel computation that retains the ideal properties of the PRAM by using it as a sub-model, while simultaneously being more reflective of realistic parallel architectures by accounting for and providing abstract control over communication and synchronization costs. The Hierarchical PRAM (H-PRAM) model controls conceptual complexity in the face of asynchrony in two ways. First, by providing the simplifying assumption of synchronization to the design of algorithms, but allowing the algorithms to work asynchronously with each other; and organizing this control asynchrony via an implicit hierarchy relation. Second, by allowing the restriction of communication asynchrony in order to obtain determinate algorithms (thus greatly simplifying proofs of correctness). It is shown that the model is reflective of a variety of existing and proposed parallel architectures, particularly ones that can support massive parallelism. Relationships to programming languages are discussed. Since the PRAM is a sub-model, we can use PRAM algorithms as sub-algorithms in algorithms for the H-PRAM; thus results that have been established with respect to the PRAM are potentially transferable to this new model. The H-PRAM can be used as a flexible tool to investigate general degrees of locality (âneighborhoods of activity) in problems, considering communication and synchronization simultaneously. This gives the potential of obtaining algorithms that map more efficiently to architectures, and of increasing the number of processors that can efficiently be used on a problem (in comparison to a PRAM that charges for communication and synchronization). The model presents a framework in which to study the extent that general locality can be exploited in parallel computing. A companion paper demonstrates the usage of the H-PRAM via the design and analysis of various algorithms for computing the complete binary tree and the FFT/butterfly graph
- âŚ