4,812 research outputs found
A user configurable data acquisition and signal processing system for high-rate, high channel count applications
Real-time signal processing in plasma fusion experiments is required for control and for data reduction as plasma pulse times grow longer. The development time and cost for these high-rate, multichannel signal processing systems can be significant. This paper proposes a new digital signal processing (DSP) platform for the data acquisition system that will allow users to easily customize real-time signal processing systems to meet their individual requirements. The D-TACQ reconfigurable user in-line DSP (DRUID) system carries out the signal processing tasks in hardware co-processors (CPs) implemented in an FPGA, with an embedded microprocessor (μP) for control. In the fully developed platform, users will be able to choose co-processors from a library and configure programmable parameters through the μP to meet their requirements. The DRUID system is implemented on a Spartan 6 FPGA, on the new rear transition module (RTM-T), a field upgrade to existing D-TACQ digitizers. As proof of concept, a multiply-accumulate (MAC) co-processor has been developed, which can be configured as a digital chopper-integrator for long pulse magnetic fusion devices. The DRUID platform allows users to set options for the integrator, such as the number of masking samples. Results from the digital integrator are presented for a data acquisition system with 96 channels simultaneously acquiring data at 500 kSamples/s per channel
Learning Transferable Architectures for Scalable Image Recognition
Developing neural network image classification models often requires
significant architecture engineering. In this paper, we study a method to learn
the model architectures directly on the dataset of interest. As this approach
is expensive when the dataset is large, we propose to search for an
architectural building block on a small dataset and then transfer the block to
a larger dataset. The key contribution of this work is the design of a new
search space (the "NASNet search space") which enables transferability. In our
experiments, we search for the best convolutional layer (or "cell") on the
CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking
together more copies of this cell, each with their own parameters to design a
convolutional architecture, named "NASNet architecture". We also introduce a
new regularization technique called ScheduledDropPath that significantly
improves generalization in the NASNet models. On CIFAR-10 itself, NASNet
achieves 2.4% error rate, which is state-of-the-art. On ImageNet, NASNet
achieves, among the published works, state-of-the-art accuracy of 82.7% top-1
and 96.2% top-5 on ImageNet. Our model is 1.2% better in top-1 accuracy than
the best human-invented architectures while having 9 billion fewer FLOPS - a
reduction of 28% in computational demand from the previous state-of-the-art
model. When evaluated at different levels of computational cost, accuracies of
NASNets exceed those of the state-of-the-art human-designed models. For
instance, a small version of NASNet also achieves 74% top-1 accuracy, which is
3.1% better than equivalently-sized, state-of-the-art models for mobile
platforms. Finally, the learned features by NASNet used with the Faster-RCNN
framework surpass state-of-the-art by 4.0% achieving 43.1% mAP on the COCO
dataset
A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones
Fully-autonomous miniaturized robots (e.g., drones), with artificial
intelligence (AI) based visual navigation capabilities are extremely
challenging drivers of Internet-of-Things edge intelligence capabilities.
Visual navigation based on AI approaches, such as deep neural networks (DNNs)
are becoming pervasive for standard-size drones, but are considered out of
reach for nanodrones with size of a few cm. In this work, we
present the first (to the best of our knowledge) demonstration of a navigation
engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based
visual navigation. To achieve this goal we developed a complete methodology for
parallel execution of complex DNNs directly on-bard of resource-constrained
milliwatt-scale nodes. Our system is based on GAP8, a novel parallel
ultra-low-power computing platform, and a 27 g commercial, open-source
CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the
software mapping techniques that enable the state-of-the-art deep convolutional
neural network presented in [1] to be fully executed on-board within a strict 6
fps real-time constraint with no compromise in terms of flight results, while
all processing is done with only 64 mW on average. Our navigation engine is
flexible and can be used to span a wide performance range: at its peak
performance corner it achieves 18 fps while still consuming on average just
3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication
in the IEEE Internet of Things Journal (IEEE IOTJ
Low-Power and Reconfigurable Asynchronous ASIC Design Implementing Recurrent Neural Networks
Artificial intelligence (AI) has experienced a tremendous surge in recent years, resulting in high demand for a wide array of implementations of algorithms in the field. With the rise of Internet-of-Things devices, the need for artificial intelligence algorithms implemented in hardware with tight design restrictions has become even more prevalent. In terms of low power and area, ASIC implementations have the best case. However, these implementations suffer from high non-recurring engineering costs, long time-to-market, and a complete lack of flexibility, which significantly hurts their appeal in an environment where time-to-market is so critical. The time-to-market gap can be shortened through the use of reconfigurable solutions, such as FPGAs, but these come with high cost per unit and significant power and area deficiencies over their ASIC counterparts. To bridge these gaps, this dissertation work develops two methodologies to improve the usability of ASIC implementations of neural networks in these applications.
The first method demonstrates a method for substantial reductions in design time for asynchronous implementations of a set of AI algorithms known as Recurrent Neural Networks (RNN) by analyzing the possible architectures and implementing a library of generic or easily altered components that can be used to quickly implement a chosen RNN architecture. A tapeout of this method was completed using as few as 112 hours of labor by the designer from RNN selection to a DRC/LVS clean chip layout ready for fabrication.
The second method develops a flow to implement a set of RNNs in a single reconfigurable ASIC, offering a middle ground between fully reconfigurable solutions and completely application-specific implementations. This reconfigurable design is capable of representing thousands of possible RNN configurations in a single IC. A tapeout of this design was also completed, with both tapeouts using the TSMC 65nm bulk CMOS process
Spatially Varying Nanophotonic Neural Networks
The explosive growth of computation and energy cost of artificial
intelligence has spurred strong interests in new computing modalities as
potential alternatives to conventional electronic processors. Photonic
processors that execute operations using photons instead of electrons, have
promised to enable optical neural networks with ultra-low latency and power
consumption. However, existing optical neural networks, limited by the
underlying network designs, have achieved image recognition accuracy much lower
than state-of-the-art electronic neural networks. In this work, we close this
gap by introducing a large-kernel spatially-varying convolutional neural
network learned via low-dimensional reparameterization techniques. We
experimentally instantiate the network with a flat meta-optical system that
encompasses an array of nanophotonic structures designed to induce
angle-dependent responses. Combined with an extremely lightweight electronic
backend with approximately 2K parameters we demonstrate a nanophotonic neural
network reaches 73.80\% blind test classification accuracy on CIFAR-10 dataset,
and, as such, the first time, an optical neural network outperforms the first
modern digital neural network -- AlexNet (72.64\%) with 57M parameters,
bringing optical neural network into modern deep learning era
- …