2,351 research outputs found
MGSim - Simulation tools for multi-core processor architectures
MGSim is an open source discrete event simulator for on-chip hardware
components, developed at the University of Amsterdam. It is intended to be a
research and teaching vehicle to study the fine-grained hardware/software
interactions on many-core and hardware multithreaded processors. It includes
support for core models with different instruction sets, a configurable
multi-core interconnect, multiple configurable cache and memory models, a
dedicated I/O subsystem, and comprehensive monitoring and interaction
facilities. The default model configuration shipped with MGSim implements
Microgrids, a many-core architecture with hardware concurrency management.
MGSim is furthermore written mostly in C++ and uses object classes to represent
chip components. It is optimized for architecture models that can be described
as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table
Towards Comparing the Robustness of Synchronous and Asynchronous Circuits by Fault Injection
As transient error rates are growing due to smaller feature sizes,
designing reliable synchronous circuits becomes increasingly
challenging. Asynchronous logic design constitutes a promising
alternative with respect to robustness and stability. In particular,
delay-insensitive asynchronous circuits provide interesting properties,
like an inherent resilience to delay-faults
An AER handshake-less modular infrastructure PCB with x8 2.5Gbps LVDS serial links
Nowadays spike-based brain processing emulation is
taking off. Several EU and others worldwide projects are
demonstrating this, like SpiNNaker, BrainScaleS, FACETS, or
NeuroGrid. The larger the brain process emulation on silicon is,
the higher the communication performance of the hosting
platforms has to be. Many times the bottleneck of these system
implementations is not on the performance inside a chip or a
board, but in the communication between boards. This paper
describes a novel modular Address-Event-Representation (AER)
FPGA-based (Spartan6) infrastructure PCB (the AER-Node
board) with 2.5Gbps LVDS high speed serial links over SATA
cables that offers a peak performance of 32-bit 62.5Meps (Mega
events per second) on board-to-board communications. The
board allows back compatibility with parallel AER devices
supporting up to x2 28-bit parallel data with asynchronous
handshake. These boards also allow modular expansion
functionality through several daughter boards. The paper is
focused on describing in detail the LVDS serial interface and
presenting its performance.Ministerio de Ciencia e Innovación TEC2009-10639-C04-02/01Ministerio de Economía y Competitividad TEC2012-37868-C04-02/01Junta de Andalucía TIC-6091Ministerio de Economía y Competitividad PRI-PIMCHI-2011-076
Six networks on a universal neuromorphic computing substrate
In this study, we present a highly configurable neuromorphic computing substrate and use it for emulating several types of neural networks. At the heart of this system lies a mixed-signal chip, with analog implementations of neurons and synapses and digital transmission of action potentials. Major advantages of this emulation device, which has been explicitly designed as a universal neural network emulator, are its inherent parallelism and high acceleration factor compared to conventional computers. Its configurability allows the realization of almost arbitrary network topologies and the use of widely varied neuronal and synaptic parameters. Fixed-pattern noise inherent to analog circuitry is reduced by calibration routines. An integrated development environment allows neuroscientists to operate the device without any prior knowledge of neuromorphic circuit design. As a showcase for the capabilities of the system, we describe the successful emulation of six different neural networks which cover a broad spectrum of both structure and functionality
Hardware emulation of stochastic p-bits for invertible logic
The common feature of nearly all logic and memory devices is that they make
use of stable units to represent 0's and 1's. A completely different paradigm
is based on three-terminal stochastic units which could be called "p-bits",
where the output is a random telegraphic signal continuously fluctuating
between 0 and 1 with a tunable mean. p-bits can be interconnected to receive
weighted contributions from others in a network, and these weighted
contributions can be chosen to not only solve problems of optimization and
inference but also to implement precise Boolean functions in an inverted mode.
This inverted operation of Boolean gates is particularly striking: They provide
inputs consistent to a given output along with unique outputs to a given set of
inputs. The existing demonstrations of accurate invertible logic are
intriguing, but will these striking properties observed in computer simulations
carry over to hardware implementations? This paper uses individual micro
controllers to emulate p-bits, and we present results for a 4-bit ripple carry
adder with 48 p-bits and a 4-bit multiplier with 46 p-bits working in inverted
mode as a factorizer. Our results constitute a first step towards implementing
p-bits with nano devices, like stochastic Magnetic Tunnel Junctions
A general technique for deterministic model-cycle-level debugging
Efficient use of FPGA resources requires FPGA-based performance models of complex hardware to implement one model cycle, i.e., one time-step of the original synchronous system, in several implementation cycles. Generally implementation cycles have no simple relationship with model cycles, and it is tricky to reconstruct the state of the synchronous system at the model-cycle boundaries if only implementation-cycle-level control and information is provided. A good debugging facility needs to provide: complete control over the functioning of the target design being simulated; fast and easy access to all the significant target design state for both monitoring and modification; and some means of accomplishing deterministic execution when the target design is a multicore processor running a parallel application. Moreover, these features need to be provided in a manner which does not incur substantial resource and performance penalties. In this paper, we present a debugging technique based on the LI-BDN theory. We show how the technique facilitates deterministic model-cycle-level debugging. We used it to build the debugging infrastructure for Arete, which is an FPGA-based cycle-accurate multicore simulator. The resource and performance penalties of our debugging technique are minimal; in Arete the debugging infrastructure has area and performance overheads of 5% and 6%, respectively.IBM Researc
Characterization and Compensation of Network-Level Anomalies in Mixed-Signal Neuromorphic Modeling Platforms
Advancing the size and complexity of neural network models leads to an ever
increasing demand for computational resources for their simulation.
Neuromorphic devices offer a number of advantages over conventional computing
architectures, such as high emulation speed or low power consumption, but this
usually comes at the price of reduced configurability and precision. In this
article, we investigate the consequences of several such factors that are
common to neuromorphic devices, more specifically limited hardware resources,
limited parameter configurability and parameter variations. Our final aim is to
provide an array of methods for coping with such inevitable distortion
mechanisms. As a platform for testing our proposed strategies, we use an
executable system specification (ESS) of the BrainScaleS neuromorphic system,
which has been designed as a universal emulation back-end for neuroscientific
modeling. We address the most essential limitations of this device in detail
and study their effects on three prototypical benchmark network models within a
well-defined, systematic workflow. For each network model, we start by defining
quantifiable functionality measures by which we then assess the effects of
typical hardware-specific distortion mechanisms, both in idealized software
simulations and on the ESS. For those effects that cause unacceptable
deviations from the original network dynamics, we suggest generic compensation
mechanisms and demonstrate their effectiveness. Both the suggested workflow and
the investigated compensation mechanisms are largely back-end independent and
do not require additional hardware configurability beyond the one required to
emulate the benchmark networks in the first place. We hereby provide a generic
methodological environment for configurable neuromorphic devices that are
targeted at emulating large-scale, functional neural networks
Fast quasi-synchronous harmonic algorithm based on weight window function- mixed radix FFT
According to the requirements of IEC61850-9-2LE, digital energy metering devices mainly adopt 80×fr fixed sampling rate. When the harmonic analysis is carried out under asynchronous sampling, it will produce large errors due to spectral leakage. Quasi-Synchronous Algorithm has high accuracy, but the calculation process is complicated and the hardware overheads are high. Based on the characteristics of digital energy metering devices, this paper puts forward a Fast Quasi-Synchronous Harmonic Algorithm using weight window function combined with Mixed Radix Fast Fourier Transform Algorithm. It will reduce the calculation by more than 94%. Compared with the Triangle/Hanning/Nuttall4(III)-Windowed Interpolated FFT Algorithm, the proposed algorithm will perform better in accuracy and has the feature that the more asynchronous of the sampling, the more obvious the error will be
Synchronous handshake circuits
We present the synchronous implementation of handshake circuits as an extra feature in the otherwise asynchronous design flow based on Tangram. This synchronous option can be used in the mapping onto FPGAs or as a fallback option to provide a circuit that is easier to test and integrate in a synchronous environment. When single-rail and synchronous realizations of the same handshake circuit are compared, the synchronous versions typically require fewer state-holding elements, occupy less area, have similar performance, but consume significantly more power (in the examples studied up to a factor four). Synchronous handshake circuits provide a means to study clock-gating techniques based on the synthesis starting from a behavioral-level specification. In addition, the study provides hints as to where the asynchronous handshake circuits may be optimized further
- …