2,351 research outputs found

    MGSim - Simulation tools for multi-core processor architectures

    Get PDF
    MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on many-core and hardware multithreaded processors. It includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a many-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table

    Towards Comparing the Robustness of Synchronous and Asynchronous Circuits by Fault Injection

    Get PDF
    As transient error rates are growing due to smaller feature sizes, designing reliable synchronous circuits becomes increasingly challenging. Asynchronous logic design constitutes a promising alternative with respect to robustness and stability. In particular, delay-insensitive asynchronous circuits provide interesting properties, like an inherent resilience to delay-faults

    An AER handshake-less modular infrastructure PCB with x8 2.5Gbps LVDS serial links

    Get PDF
    Nowadays spike-based brain processing emulation is taking off. Several EU and others worldwide projects are demonstrating this, like SpiNNaker, BrainScaleS, FACETS, or NeuroGrid. The larger the brain process emulation on silicon is, the higher the communication performance of the hosting platforms has to be. Many times the bottleneck of these system implementations is not on the performance inside a chip or a board, but in the communication between boards. This paper describes a novel modular Address-Event-Representation (AER) FPGA-based (Spartan6) infrastructure PCB (the AER-Node board) with 2.5Gbps LVDS high speed serial links over SATA cables that offers a peak performance of 32-bit 62.5Meps (Mega events per second) on board-to-board communications. The board allows back compatibility with parallel AER devices supporting up to x2 28-bit parallel data with asynchronous handshake. These boards also allow modular expansion functionality through several daughter boards. The paper is focused on describing in detail the LVDS serial interface and presenting its performance.Ministerio de Ciencia e Innovación TEC2009-10639-C04-02/01Ministerio de Economía y Competitividad TEC2012-37868-C04-02/01Junta de Andalucía TIC-6091Ministerio de Economía y Competitividad PRI-PIMCHI-2011-076

    Six networks on a universal neuromorphic computing substrate

    Get PDF
    In this study, we present a highly configurable neuromorphic computing substrate and use it for emulating several types of neural networks. At the heart of this system lies a mixed-signal chip, with analog implementations of neurons and synapses and digital transmission of action potentials. Major advantages of this emulation device, which has been explicitly designed as a universal neural network emulator, are its inherent parallelism and high acceleration factor compared to conventional computers. Its configurability allows the realization of almost arbitrary network topologies and the use of widely varied neuronal and synaptic parameters. Fixed-pattern noise inherent to analog circuitry is reduced by calibration routines. An integrated development environment allows neuroscientists to operate the device without any prior knowledge of neuromorphic circuit design. As a showcase for the capabilities of the system, we describe the successful emulation of six different neural networks which cover a broad spectrum of both structure and functionality

    Hardware emulation of stochastic p-bits for invertible logic

    Full text link
    The common feature of nearly all logic and memory devices is that they make use of stable units to represent 0's and 1's. A completely different paradigm is based on three-terminal stochastic units which could be called "p-bits", where the output is a random telegraphic signal continuously fluctuating between 0 and 1 with a tunable mean. p-bits can be interconnected to receive weighted contributions from others in a network, and these weighted contributions can be chosen to not only solve problems of optimization and inference but also to implement precise Boolean functions in an inverted mode. This inverted operation of Boolean gates is particularly striking: They provide inputs consistent to a given output along with unique outputs to a given set of inputs. The existing demonstrations of accurate invertible logic are intriguing, but will these striking properties observed in computer simulations carry over to hardware implementations? This paper uses individual micro controllers to emulate p-bits, and we present results for a 4-bit ripple carry adder with 48 p-bits and a 4-bit multiplier with 46 p-bits working in inverted mode as a factorizer. Our results constitute a first step towards implementing p-bits with nano devices, like stochastic Magnetic Tunnel Junctions

    A general technique for deterministic model-cycle-level debugging

    Get PDF
    Efficient use of FPGA resources requires FPGA-based performance models of complex hardware to implement one model cycle, i.e., one time-step of the original synchronous system, in several implementation cycles. Generally implementation cycles have no simple relationship with model cycles, and it is tricky to reconstruct the state of the synchronous system at the model-cycle boundaries if only implementation-cycle-level control and information is provided. A good debugging facility needs to provide: complete control over the functioning of the target design being simulated; fast and easy access to all the significant target design state for both monitoring and modification; and some means of accomplishing deterministic execution when the target design is a multicore processor running a parallel application. Moreover, these features need to be provided in a manner which does not incur substantial resource and performance penalties. In this paper, we present a debugging technique based on the LI-BDN theory. We show how the technique facilitates deterministic model-cycle-level debugging. We used it to build the debugging infrastructure for Arete, which is an FPGA-based cycle-accurate multicore simulator. The resource and performance penalties of our debugging technique are minimal; in Arete the debugging infrastructure has area and performance overheads of 5% and 6%, respectively.IBM Researc

    Characterization and Compensation of Network-Level Anomalies in Mixed-Signal Neuromorphic Modeling Platforms

    Full text link
    Advancing the size and complexity of neural network models leads to an ever increasing demand for computational resources for their simulation. Neuromorphic devices offer a number of advantages over conventional computing architectures, such as high emulation speed or low power consumption, but this usually comes at the price of reduced configurability and precision. In this article, we investigate the consequences of several such factors that are common to neuromorphic devices, more specifically limited hardware resources, limited parameter configurability and parameter variations. Our final aim is to provide an array of methods for coping with such inevitable distortion mechanisms. As a platform for testing our proposed strategies, we use an executable system specification (ESS) of the BrainScaleS neuromorphic system, which has been designed as a universal emulation back-end for neuroscientific modeling. We address the most essential limitations of this device in detail and study their effects on three prototypical benchmark network models within a well-defined, systematic workflow. For each network model, we start by defining quantifiable functionality measures by which we then assess the effects of typical hardware-specific distortion mechanisms, both in idealized software simulations and on the ESS. For those effects that cause unacceptable deviations from the original network dynamics, we suggest generic compensation mechanisms and demonstrate their effectiveness. Both the suggested workflow and the investigated compensation mechanisms are largely back-end independent and do not require additional hardware configurability beyond the one required to emulate the benchmark networks in the first place. We hereby provide a generic methodological environment for configurable neuromorphic devices that are targeted at emulating large-scale, functional neural networks

    Fast quasi-synchronous harmonic algorithm based on weight window function- mixed radix FFT

    Get PDF
    According to the requirements of IEC61850-9-2LE, digital energy metering devices mainly adopt 80×fr fixed sampling rate. When the harmonic analysis is carried out under asynchronous sampling, it will produce large errors due to spectral leakage. Quasi-Synchronous Algorithm has high accuracy, but the calculation process is complicated and the hardware overheads are high. Based on the characteristics of digital energy metering devices, this paper puts forward a Fast Quasi-Synchronous Harmonic Algorithm using weight window function combined with Mixed Radix Fast Fourier Transform Algorithm. It will reduce the calculation by more than 94%. Compared with the Triangle/Hanning/Nuttall4(III)-Windowed Interpolated FFT Algorithm, the proposed algorithm will perform better in accuracy and has the feature that the more asynchronous of the sampling, the more obvious the error will be

    Synchronous handshake circuits

    Get PDF
    We present the synchronous implementation of handshake circuits as an extra feature in the otherwise asynchronous design flow based on Tangram. This synchronous option can be used in the mapping onto FPGAs or as a fallback option to provide a circuit that is easier to test and integrate in a synchronous environment. When single-rail and synchronous realizations of the same handshake circuit are compared, the synchronous versions typically require fewer state-holding elements, occupy less area, have similar performance, but consume significantly more power (in the examples studied up to a factor four). Synchronous handshake circuits provide a means to study clock-gating techniques based on the synthesis starting from a behavioral-level specification. In addition, the study provides hints as to where the asynchronous handshake circuits may be optimized further
    corecore