Search CORE

3,321 research outputs found

Asynchronous spiking neurons, the natural key to exploit temporal sparsity

Author: Cavalcante Holanda Priscila
Dhoedt Bart
Hoseini Sahar
Khoei Mina A.
Leroux Sam
Linares-Barranco Bernabe
Moreira Orlando
Serrano-Gotarredona Teresa
Simoens Pieter
Tapson Jonathan
Yousefzadeh Amirreza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic signal processing. We believe one important feature of the brain (asynchronous state-full processing) is the key to its excellence in this domain. In this work, we show how asynchronous processing with state-full neurons allows exploitation of the existing sparsity in natural signals. This paper explains three different types of sparsity and proposes an inference algorithm which exploits all types of sparsities in the execution of already trained networks. Our experiments in three different applications (Handwritten digit recognition, Autonomous Steering and Hand-Gesture recognition) show that this model of inference reduces the number of required operations for sparse input data by a factor of one to two orders of magnitudes. Additionally, due to fully asynchronous processing this type of inference can be run on fully distributed and scalable neuromorphic hardware platforms

Ghent University Academic Bibliography

Telemetry downlink interfaces and level-zero processing

Author: Horan S.
Pfeiffer J.
Taylor J.
Publication venue
Publication date
Field of study

The technical areas being investigated are as follows: (1) processing of space to ground data frames; (2) parallel architecture performance studies; and (3) parallel programming techniques. Additionally, the University administrative details and the technical liaison between New Mexico State University and Goddard Space Flight Center are addressed

NASA Technical Reports Server

AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing

Author: Chaychi Samira
Tawakuli Amal
Theobald Martin
Venugopal Vinu E.
Publication venue
Publication date: 03/01/2020
Field of study

Distributed Stream Processing Systems (DSPSs) are among the currently most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. The major market players in this domain are clearly represented by Apache Spark and Flink, which provide a variety of frontend APIs for SQL, statistical inference, machine learning, stream processing, and many others. Yet rather few details are reported on the integration of these engines into the underlying High-Performance Computing (HPC) infrastructure and the communication protocols they use. Spark and Flink, for example, are implemented in Java and still rely on a dedicated master node for managing their control flow among the worker nodes in a compute cluster. In this paper, we describe the architecture of our AIR engine, which is designed from scratch in C++ using the Message Passing Interface (MPI), pthreads for multithreading, and is directly deployed on top of a common HPC workload manager such as SLURM. AIR implements a light-weight, dynamic sharding protocol (referred to as "Asynchronous Iterative Routing"), which facilitates a direct and asynchronous communication among all client nodes and thereby completely avoids the overhead induced by the control flow with a master node that may otherwise form a performance bottleneck. Our experiments over a variety of benchmark settings confirm that AIR outperforms Spark and Flink in terms of latency and throughput by a factor of up to 15; moreover, we demonstrate that AIR scales out much better than existing DSPSs to clusters consisting of up to 8 nodes and 224 cores.Comment: 16 pages, 6 figures, 15 plot

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Luxembourg

Scalable High-Speed Communications for Neuromorphic Systems

Author: Young Aaron Reed
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2017
Field of study

Field-programmable gate arrays (FPGA), application-specific integrated circuits (ASIC), and other chip/multi-chip level implementations can be used to implement Dynamic Adaptive Neural Network Arrays (DANNA). In some applications, DANNA interfaces with a traditional computing system to provide neural network configuration information, provide network input, process network outputs, and monitor the state of the network. The present host-to-DANNA network communication setup uses a Cypress USB 3.0 peripheral controller (FX3) to enable host-to-array communication over USB 3.0. This communications setup has to run commands in batches and does not have enough bandwidth to meet the maximum throughput requirements of the DANNA device, resulting in output packet loss. Also, the FX3 is unable to scale to support larger single-chip or multi-chip configurations. To alleviate communication limitations and to expand scalability, a new communications solution is presented which takes advantage of the GTX/GTH high-speed serial transceivers found on Xilinx FPGAs. A Xilinx VC707 evaluation kit is used to prototype the new communications board. The high-speed transceivers are used to communicate to the host computer via PCIe and to communicate to the DANNA arrays with the link layer protocol Aurora. The new communications board is able to outperform the FX3, reducing the latency in the communication and increasing the throughput of data. This new communications setup will be used to further DANNA research by allowing the DANNA arrays to scale to larger sizes and for multiple DANNA arrays to be connected to a single communication board

University of Tennessee, Knoxville: Trace

Pando: Personal Volunteer Computing in Browsers

Author: Anderson David P.
Balouek Daniel
Berry Kevin
Cherniack Mitch
Chorazyk Pawel
Dias David
Duda Jerzy
Jangda Abhinav
Lavoie Erick
Martınez Gonzalo J
Nakamoto Satoshi
Reginald Cushing
Ryza Sandy
Smolka Gert
Werner M. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/09/2019
Field of study

The large penetration and continued growth in ownership of personal electronic devices represents a freely available and largely untapped source of computing power. To leverage those, we present Pando, a new volunteer computing tool based on a declarative concurrent programming model and implemented using JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying number of failure-prone personal devices contributed by volunteers to parallelize the application of a function on a stream of values, by using the devices' browsers. We show that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing. We also show the flexibility of our approach by deploying Pando on personal devices connected over a local network, on Grid5000, a French-wide computing grid in a virtual private network, and seven PlanetLab nodes distributed in a wide area network over Europe.Comment: 14 pages, 12 figures, 2 table

arXiv.org e-Print Archive

Crossref