3,321 research outputs found
Asynchronous spiking neurons, the natural key to exploit temporal sparsity
Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic signal processing. We believe one important feature of the brain (asynchronous state-full processing) is the key to its excellence in this domain. In this work, we show how asynchronous processing with state-full neurons allows exploitation of the existing sparsity in natural signals. This paper explains three different types of sparsity and proposes an inference algorithm which exploits all types of sparsities in the execution of already trained networks. Our experiments in three different applications (Handwritten digit recognition, Autonomous Steering and Hand-Gesture recognition) show that this model of inference reduces the number of required operations for sparse input data by a factor of one to two orders of magnitudes. Additionally, due to fully asynchronous processing this type of inference can be run on fully distributed and scalable neuromorphic hardware platforms
Telemetry downlink interfaces and level-zero processing
The technical areas being investigated are as follows: (1) processing of space to ground data frames; (2) parallel architecture performance studies; and (3) parallel programming techniques. Additionally, the University administrative details and the technical liaison between New Mexico State University and Goddard Space Flight Center are addressed
AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
Distributed Stream Processing Systems (DSPSs) are among the currently most
emerging topics in data management, with applications ranging from real-time
event monitoring to processing complex dataflow programs and big data
analytics. The major market players in this domain are clearly represented by
Apache Spark and Flink, which provide a variety of frontend APIs for SQL,
statistical inference, machine learning, stream processing, and many others.
Yet rather few details are reported on the integration of these engines into
the underlying High-Performance Computing (HPC) infrastructure and the
communication protocols they use. Spark and Flink, for example, are implemented
in Java and still rely on a dedicated master node for managing their control
flow among the worker nodes in a compute cluster.
In this paper, we describe the architecture of our AIR engine, which is
designed from scratch in C++ using the Message Passing Interface (MPI),
pthreads for multithreading, and is directly deployed on top of a common HPC
workload manager such as SLURM. AIR implements a light-weight, dynamic sharding
protocol (referred to as "Asynchronous Iterative Routing"), which facilitates a
direct and asynchronous communication among all client nodes and thereby
completely avoids the overhead induced by the control flow with a master node
that may otherwise form a performance bottleneck. Our experiments over a
variety of benchmark settings confirm that AIR outperforms Spark and Flink in
terms of latency and throughput by a factor of up to 15; moreover, we
demonstrate that AIR scales out much better than existing DSPSs to clusters
consisting of up to 8 nodes and 224 cores.Comment: 16 pages, 6 figures, 15 plot
Scalable High-Speed Communications for Neuromorphic Systems
Field-programmable gate arrays (FPGA), application-specific integrated circuits (ASIC), and other chip/multi-chip level implementations can be used to implement Dynamic Adaptive Neural Network Arrays (DANNA). In some applications, DANNA interfaces with a traditional computing system to provide neural network configuration information, provide network input, process network outputs, and monitor the state of the network. The present host-to-DANNA network communication setup uses a Cypress USB 3.0 peripheral controller (FX3) to enable host-to-array communication over USB 3.0. This communications setup has to run commands in batches and does not have enough bandwidth to meet the maximum throughput requirements of the DANNA device, resulting in output packet loss. Also, the FX3 is unable to scale to support larger single-chip or multi-chip configurations. To alleviate communication limitations and to expand scalability, a new communications solution is presented which takes advantage of the GTX/GTH high-speed serial transceivers found on Xilinx FPGAs. A Xilinx VC707 evaluation kit is used to prototype the new communications board. The high-speed transceivers are used to communicate to the host computer via PCIe and to communicate to the DANNA arrays with the link layer protocol Aurora. The new communications board is able to outperform the FX3, reducing the latency in the communication and increasing the throughput of data. This new communications setup will be used to further DANNA research by allowing the DANNA arrays to scale to larger sizes and for multiple DANNA arrays to be connected to a single communication board
Pando: Personal Volunteer Computing in Browsers
The large penetration and continued growth in ownership of personal
electronic devices represents a freely available and largely untapped source of
computing power. To leverage those, we present Pando, a new volunteer computing
tool based on a declarative concurrent programming model and implemented using
JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying
number of failure-prone personal devices contributed by volunteers to
parallelize the application of a function on a stream of values, by using the
devices' browsers. We show that Pando can provide throughput improvements
compared to a single personal device, on a variety of compute-bound
applications including animation rendering and image processing. We also show
the flexibility of our approach by deploying Pando on personal devices
connected over a local network, on Grid5000, a French-wide computing grid in a
virtual private network, and seven PlanetLab nodes distributed in a wide area
network over Europe.Comment: 14 pages, 12 figures, 2 table
- …