52,864 research outputs found
Reaction–diffusion chemistry implementation of associative memory neural network
Unconventional computing paradigms are typically very difficult to program. By implementing efficient parallel control architectures such as artificial neural networks, we show that it is possible to program unconventional paradigms with relative ease. The work presented implements correlation matrix memories (a form of artificial neural network based on associative memory) in reaction–diffusion chemistry, and shows that implementations of such artificial neural networks can be trained and act in a similar way to conventional implementations
DAMNED: A Distributed and Multithreaded Neural Event-Driven simulation framework
In a Spiking Neural Networks (SNN), spike emissions are sparsely and
irregularly distributed both in time and in the network architecture. Since a
current feature of SNNs is a low average activity, efficient implementations of
SNNs are usually based on an Event-Driven Simulation (EDS). On the other hand,
simulations of large scale neural networks can take advantage of distributing
the neurons on a set of processors (either workstation cluster or parallel
computer). This article presents DAMNED, a large scale SNN simulation framework
able to gather the benefits of EDS and parallel computing. Two levels of
parallelism are combined: Distributed mapping of the neural topology, at the
network level, and local multithreaded allocation of resources for simultaneous
processing of events, at the neuron level. Based on the causality of events, a
distributed solution is proposed for solving the complex problem of scheduling
without synchronization barrier.Comment: 6 page
Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
Recent technological advances have proliferated the available computing
power, memory, and speed of modern Central Processing Units (CPUs), Graphics
Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs).
Consequently, the performance and complexity of Artificial Neural Networks
(ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs)
currently offer state-of-the-art performance, they consume large amounts of
power. Training such networks on CPUs is inefficient, as data throughput and
parallel computation is limited. FPGAs are considered a suitable candidate for
performance critical, low power systems, e.g. the Internet of Things (IOT) edge
devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development
environment, networks described using the high-level OpenCL framework can be
accelerated on heterogeneous platforms. Moreover, the resource utilization and
power consumption of DNNs can be further enhanced by utilizing regularization
techniques that binarize network weights. In this paper, we introduce, to the
best of our knowledge, the first FPGA-accelerated stochastically binarized DNN
implementations, and compare them to implementations accelerated using both
GPUs and FPGAs. Our developed networks are trained and benchmarked using the
popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art
performance, while offering a >16-fold improvement in power consumption,
compared to conventional GPU-accelerated networks. Both our FPGA-accelerated
determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10
by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl
Memristor Neural Network Design
Neural network, a powerful learning model, has archived amazing results. However, the current Von Neumann computing system–based implementations of neural networks are suffering from memory wall and communication bottleneck problems ascribing to the Complementary Metal Oxide Semiconductor (CMOS) technology scaling down and communication gap. Memristor, a two terminal nanosolid state nonvolatile resistive switching, can provide energy‐efficient neuromorphic computing with its synaptic behavior. Crossbar architecture can be used to perform neural computations because of its high density and parallel computation. Thus, neural networks based on memristor crossbar will perform better in real world applications. In this chapter, the design of different neural network architectures based on memristor is introduced, including spiking neural networks, multilayer neural networks, convolution neural networks, and recurrent neural networks. And the brief introduction, the architecture, the computing circuits, and the training algorithm of each kind of neural networks are presented by instances. The potential applications and the prospects of memristor‐based neural network system are discussed
Single stream parallelization of generalized LSTM-like RNNs on a GPU
Recurrent neural networks (RNNs) have shown outstanding performance on
processing sequence data. However, they suffer from long training time, which
demands parallel implementations of the training procedure. Parallelization of
the training algorithms for RNNs are very challenging because internal
recurrent paths form dependencies between two different time frames. In this
paper, we first propose a generalized graph-based RNN structure that covers the
most popular long short-term memory (LSTM) network. Then, we present a
parallelization approach that automatically explores parallelisms of arbitrary
RNNs by analyzing the graph structure. The experimental results show that the
proposed approach shows great speed-up even with a single training stream, and
further accelerates the training when combined with multiple parallel training
streams.Comment: Accepted by the 40th IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) 201
Improving the Expressiveness of Deep Learning Frameworks with Recursion
Recursive neural networks have widely been used by researchers to handle
applications with recursively or hierarchically structured data. However,
embedded control flow deep learning frameworks such as TensorFlow, Theano,
Caffe2, and MXNet fail to efficiently represent and execute such neural
networks, due to lack of support for recursion. In this paper, we add recursion
to the programming model of existing frameworks by complementing their design
with recursive execution of dataflow graphs as well as additional APIs for
recursive definitions. Unlike iterative implementations, which can only
understand the topological index of each node in recursive data structures, our
recursive implementation is able to exploit the recursive relationships between
nodes for efficient execution based on parallel computation. We present an
implementation on TensorFlow and evaluation results with various recursive
neural network models, showing that our recursive implementation not only
conveys the recursive nature of recursive neural networks better than other
implementations, but also uses given resources more effectively to reduce
training and inference time.Comment: Appeared in EuroSys 2018. 13 pages, 11 figure
Parallel and pseudorandom discrete event system specification vs. networks of spiking neurons: Formalization and preliminary implementation results
International audienceUsual Parallel Discrete Event System Specification (P-DEVS) allows specifying systems from modeling to simulation. However, the framework does not incorporate parallel and stochastic simulations. This work intends to extend P-DEVS to parallel simulations and pseudorandom number generators in the context of a spiking neural network. The discrete event specification presented here makes explicit and centralized the parallel computation of events as well as their routing, making further implementations more easy. It is then expected to dispose of a well defined mathematical and computational framework to deal with networks of spiking neurons
Photonic reservoir computing: a new approach to optical information processing
Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks and has been successfully used in several pattern classification problems, like speech and image recognition. The implementations have so far been in software, limiting their speed and power efficiency. Photonics could be an excellent platform for a hardware implementation of this concept because of its inherent parallelism and unique nonlinear behaviour. We propose using a network of coupled Semiconductor Optical Amplifiers (SOA) and show in simulation that it could be used as a reservoir by comparing it on a benchmark speech recognition task to conventional software implementations. In spite of several differences, they perform as good as or better than conventional implementations. Moreover, a photonic implementation offers the promise of massively parallel information processing with low power and high speed. We will also address the role phase plays on the reservoir performance
- …