12,252 research outputs found
Developement of real time diagnostics and feedback algorithms for JET in view of the next step
Real time control of many plasma parameters will be an essential aspect in
the development of reliable high performance operation of Next Step Tokamaks.
The main prerequisites for any feedback scheme are the precise real-time
determination of the quantities to be controlled, requiring top quality and
highly reliable diagnostics, and the availability of robust control algorithms.
A new set of real time diagnostics was recently implemented on JET to prove the
feasibility of determining, with high accuracy and time resolution, the most
important plasma quantities. With regard to feedback algorithms, new
model–based controllers were developed to allow a more robust control of
several plasma parameters. Both diagnostics and algorithms were successfully
used in several experiments, ranging from H-mode plasmas to configuration with
ITBs. Since elaboration of computationally heavy measurements is often
required, significant attention was devoted to non-algorithmic methods like
Digital or Cellular Neural/Nonlinear Networks. The real time hardware and
software adopted architectures are also described with particular attention to
their relevance to ITER.Comment: 12th International Congress on Plasma Physics, 25-29 October 2004,
Nice (France
PZnet: Efficient 3D ConvNet Inference on Manycore CPUs
Convolutional nets have been shown to achieve state-of-the-art accuracy in
many biomedical image analysis tasks. Many tasks within biomedical analysis
domain involve analyzing volumetric (3D) data acquired by CT, MRI and
Microscopy acquisition methods. To deploy convolutional nets in practical
working systems, it is important to solve the efficient inference problem.
Namely, one should be able to apply an already-trained convolutional network to
many large images using limited computational resources. In this paper we
present PZnet, a CPU-only engine that can be used to perform inference for a
variety of 3D convolutional net architectures. PZNet outperforms MKL-based CPU
implementations of PyTorch and Tensorflow by more than 3.5x for the popular
U-net architecture. Moreover, for 3D convolutions with low featuremap numbers,
cloud CPU inference with PZnet outperfroms cloud GPU inference in terms of cost
efficiency
Large-Scale Optical Neural Networks based on Photoelectric Multiplication
Recent success in deep neural networks has generated strong interest in
hardware accelerators to improve speed and energy consumption. This paper
presents a new type of photonic accelerator based on coherent detection that is
scalable to large () networks and can be operated at high (GHz)
speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using
the massive spatial multiplexing enabled by standard free-space optical
components. In contrast to previous approaches, both weights and inputs are
optically encoded so that the network can be reprogrammed and trained on the
fly. Simulations of the network using models for digit- and
image-classification reveal a "standard quantum limit" for optical neural
networks, set by photodetector shot noise. This bound, which can be as low as
50 zJ/MAC, suggests performance below the thermodynamic (Landauer) limit for
digital irreversible computation is theoretically possible in this device. The
proposed accelerator can implement both fully-connected and convolutional
networks. We also present a scheme for back-propagation and training that can
be performed in the same hardware. This architecture will enable a new class of
ultra-low-energy processors for deep learning.Comment: Text: 10 pages, 5 figures, 1 table. Supplementary: 8 pages, 5,
figures, 2 table
Recommended from our members
Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization.
The key operation in stochastic neural networks, which have become the state-of-the-art approach for solving problems in machine learning, information theory, and statistics, is a stochastic dot-product. While there have been many demonstrations of dot-product circuits and, separately, of stochastic neurons, the efficient hardware implementation combining both functionalities is still missing. Here we report compact, fast, energy-efficient, and scalable stochastic dot-product circuits based on either passively integrated metal-oxide memristors or embedded floating-gate memories. The circuit's high performance is due to mixed-signal implementation, while the efficient stochastic operation is achieved by utilizing circuit's noise, intrinsic and/or extrinsic to the memory cell array. The dynamic scaling of weights, enabled by analog memory devices, allows for efficient realization of different annealing approaches to improve functionality. The proposed approach is experimentally verified for two representative applications, namely by implementing neural network for solving a four-node graph-partitioning problem, and a Boltzmann machine with 10-input and 8-hidden neurons
Bio-Inspired Stereo Vision Calibration for Dynamic Vision Sensors
Many advances have been made in the eld of computer vision. Several recent research trends
have focused on mimicking human vision by using a stereo vision system. In multi-camera systems, a
calibration process is usually implemented to improve the results accuracy. However, these systems generate
a large amount of data to be processed; therefore, a powerful computer is required and, in many cases,
this cannot be done in real time. Neuromorphic Engineering attempts to create bio-inspired systems that
mimic the information processing that takes place in the human brain. This information is encoded using
pulses (or spikes) and the generated systems are much simpler (in computational operations and resources),
which allows them to perform similar tasks with much lower power consumption, thus these processes
can be developed over specialized hardware with real-time processing. In this work, a bio-inspired stereovision
system is presented, where a calibration mechanism for this system is implemented and evaluated
using several tests. The result is a novel calibration technique for a neuromorphic stereo vision system,
implemented over specialized hardware (FPGA - Field-Programmable Gate Array), which allows obtaining
reduced latencies on hardware implementation for stand-alone systems, and working in real time.Ministerio de Economía y Competitividad TEC2016-77785-PMinisterio de Economía y Competitividad TIN2016-80644-
Scalable Emulation of Sign-ProblemFree Hamiltonians with Room Temperature p-bits
The growing field of quantum computing is based on the concept of a q-bit
which is a delicate superposition of 0 and 1, requiring cryogenic temperatures
for its physical realization along with challenging coherent coupling
techniques for entangling them. By contrast, a probabilistic bit or a p-bit is
a robust classical entity that fluctuates between 0 and 1, and can be
implemented at room temperature using present-day technology. Here, we show
that a probabilistic coprocessor built out of room temperature p-bits can be
used to accelerate simulations of a special class of quantum many-body systems
that are sign-problemfree or stoquastic, leveraging the well-known
Suzuki-Trotter decomposition that maps a -dimensional quantum many body
Hamiltonian to a +1-dimensional classical Hamiltonian. This mapping allows
an efficient emulation of a quantum system by classical computers and is
commonly used in software to perform Quantum Monte Carlo (QMC) algorithms. By
contrast, we show that a compact, embedded MTJ-based coprocessor can serve as a
highly efficient hardware-accelerator for such QMC algorithms providing several
orders of magnitude improvement in speed compared to optimized CPU
implementations. Using realistic device-level SPICE simulations we demonstrate
that the correct quantum correlations can be obtained using a classical
p-circuit built with existing technology and operating at room temperature. The
proposed coprocessor can serve as a tool to study stoquastic quantum many-body
systems, overcoming challenges associated with physical quantum annealers.Comment: Fixed minor typos and expanded Appendi
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
Research has shown that convolutional neural networks contain significant
redundancy, and high classification accuracy can be obtained even when weights
and activations are reduced from floating point to binary values. In this
paper, we present FINN, a framework for building fast and flexible FPGA
accelerators using a flexible heterogeneous streaming architecture. By
utilizing a novel set of optimizations that enable efficient mapping of
binarized neural networks to hardware, we implement fully connected,
convolutional and pooling layers, with per-layer compute resources being
tailored to user-provided throughput requirements. On a ZC706 embedded FPGA
platform drawing less than 25 W total system power, we demonstrate up to 12.3
million image classifications per second with 0.31 {\mu}s latency on the MNIST
dataset with 95.8% accuracy, and 21906 image classifications per second with
283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1%
and 94.9% accuracy. To the best of our knowledge, ours are the fastest
classification rates reported to date on these benchmarks.Comment: To appear in the 25th International Symposium on Field-Programmable
Gate Arrays, February 201
- …