258 research outputs found

    Work-in-Progress: Quantized NNs as the Definitive solution for inference on low-power ARM MCUs?

    Get PDF
    High energy efficiency and low memory footprint are the key requirements for the deployment of deep learning based analytics on low-power microcontrollers. Here we present work-in-progress results with Q-bit Quantized Neural Networks (QNNs) deployed on a commercial Cortex-M7 class microcontroller by means of an extension to the ARM CMSIS-NN library. We show that i) for Q=4 and Q=2 low memory footprint QNNs can be deployed with an energy overhead of 30% and 36% respectively against the 8-bit CMSIS-NN due to the lack of quantization support in the ISA; ii) for Q=1 native instructions can be used, yielding an energy and latency reduction of 3c3.8 7 with respect to CMSIS-NN. Our initial results suggest that a small set of QNN-related specialized instructions could improve performance by as much as 7.5 7 for Q=4, 13.6 7 for Q=2 and 6.5 7 for binary NNs

    Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers

    Get PDF
    The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2 MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512 kB, the best MobileNetV1 model scores up to 68.4% on Imagenet thanks to the found quantization policy, resulting to be 4% more accurate than the other 8-bit networks fitting the same memory constraints

    Magneto-transport in high g-factor, low-density two-dimensional electron systems confined in In_0.75Ga_0.25As/In_0.75Al_0.25As quantum wells

    Full text link
    We report magneto-transport measurements on high-mobility two-dimensional electron systems (2DESs) confined in In_0.75Ga_0.25As/In_0.75Al_0.25As single quantum wells. Several quantum Hall states are observed in a wide range of temperatures and electron densities, the latter controlled by a gate voltage down to values of 1.10^11 cm^-2. A tilted-field configuration is used to induce Landau level crossings and magnetic transitions between quantum Hall states with different spin polarizations. A large filling factor dependent effective electronic g-factor is determined by the coincidence method and cyclotron resonance measurements. From these measurements the change in exchange-correlation energy at the magnetic transition is deduced. These results demonstrate the impact of many-body effects in tilted-field magneto-transport of high-mobility 2DESs confined in In_0.75Ga_0.25As/In_0.75Al_0.25As quantum wells. The large tunability of electron density and effective g-factor, in addition, make this material system a promising candidate for the observation of a large variety of spin-related phenomena.Comment: 7 pages, 5 figure

    Anti-crossings of spin-split Landau levels in an InAs two-dimensional electron gas with spin-orbit coupling

    Full text link
    We report tilted-field transport measurements in the quantum-Hall regime in an InAs/In_0.75Ga_0.25As/In_0.75Al_0.25As quantum well. We observe anti-crossings of spin-split Landau levels, which suggest a mixing of spin states at Landau level coincidence. We propose that the level repulsion is due to the presence of spin-orbit and of band-non-parabolicity terms which are relevant in narrow-gap systems. Furthermore, electron-electron interaction is significant in our structure, as demonstrated by the large values of the interaction-induced enhancement of the electronic g-factor.Comment: 4 pages, 3 figure

    Electron-phonon coupling in the two phonon mode ternary alloy Al0.25In0.75As/Ga0.25In0.75AsAl_{0.25}In_{0.75}As/Ga_{0.25}In_{0.75}As quantum well

    Full text link
    We have investigated the infrared transmission of a two-dimensional (2DEG) electron gas confined in a Al0.25In0.75As/Ga0.25In0.75AsAl_{0.25}In_{0.75}As/Ga_{0.25}In_{0.75}As single quantum well in order to study the electron optical phonon interaction in a two phonon mode system. Infrared transmission experiments have been performed in both the perpendicular Faraday (PF) and tilted Faraday (TF) configurations for which the growth axis of the sample is tilted with respect to the incident light propagation direction and to the magnetic field direction. The experimental results lead to question the validity of the concept of polaron mass in a real material.Comment: 7 pages, 3 figure

    Neuraghe: Exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on zynQ SoCs

    Get PDF
    Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: While the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURAghe supports end-to-end CNN-based classification with a peak performance of 169GOps/s and an energy efficiency of 17GOps/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 frames per second (fps) on the end-to-end execution of VGG-16 and 6.6fps on ResNet-18

    Mixed-data-model heterogeneous compilation and OpenMP offloading

    Get PDF
    Heterogeneous computers combine a general-purpose host processor with domain-specific programmable many-core accelerators, uniting high versatility with high performance and energy efficiency. While the host manages ever-more application memory, accelerators are designed to work mainly on their local memory. This difference in addressed memory leads to a discrepancy between the optimal address width of the host and the accelerator. Today 64-bit host processors are commonplace, but few accelerators exceed 32-bit addressable local memory, a difference expected to increase with 128-bit hosts in the exascale era. Managing this discrepancy requires support for multiple data models in heterogeneous compilers. So far, compiler support for multiple data models has not been explored, which hampers the programmability of such systems and inhibits their adoption. In this work, we perform the first exploration of the feasibility and performance of implementing a mixed-data-mode heterogeneous system. To support this, we present and evaluate the first mixed-data-model compiler, supporting arbitrary address widths on host and accelerator. To hide the inherent complexity and to enable high programmer productivity, we implement transparent offloading on top of OpenMP. The proposed compiler techniques are implemented in LLVM and evaluated on a 64+32-bit heterogeneous SoC. Results on benchmarks from the PolyBench-ACC suite show that memory can be transparently shared between host and accelerator at overheads below 0.7 % compared to 32-bit-only execution, enabling mixed-data-model computers to execute at near-native performance

    Benthic foraminifera as indicators of hydrologic and environmental conditions in the Ross Sea (Antarctica)

    Get PDF
    This study, present data on benthic foraminiferal assemblages from four box cores collected in different areas of the Ross Sea during the 2005 oceanographic cruise in the framework of the Italian Antarctic Research National Programme (PNRA)

    Natural versus anthropic influence on north adriatic coast detected by geochemical analyses

    Get PDF
    This study focused on the geochemical and sedimentological characterization of recent sediments from two marine sites (S1 and E1) located in the North Adriatic Sea, between the Po River prodelta and the Rimini coast. Major and trace metal concentrations reflect the drainage area of the Po River and its tributaries, considered one of the most polluted areas in Europe. Sediment geochemistry of the two investigated sites denote distinct catchment areas. High values of Cr, Ni, Pb and Zn detected in sediments collected in the Po River prodelta (S1 site) suggest the Po River supply, while lower levels of these elements characterize sediments collected in front of the Rimini coast (E1 site), an indication of Northern Apennines provenance. Historical trends of Pb and Zn reconstructed from the sedimentary record around the E1 site document several changes that can be correlated with the industrialization subsequent to World War II, the implementation of the environmental policy in 1976 and the effects of the Comacchio dumping at the end of 1980. At the S1 site, the down core distributions of trace elements indicate a reduction of contaminants due to the introduction of the Italian Law 319/76 and the implementation of anti-pollution policies on automotive Pb (unleaded fuels) in the second half of the 1980s

    Two-dimensional electron gas formation in undoped In[0.75]Ga[0.25]As/In[0.75]Al[0.25]As quantum wells

    Full text link
    We report on the achievement of a two-dimensional electron gas in completely undoped In[0.75]Al[0.25]As/In[0.75]Ga[0.25]As metamorphic quantum wells. Using these structures we were able to reduce the carrier density, with respect to reported values in similar modulation-doped structures. We found experimentally that the electronic charge in the quantum well is likely due to a deep-level donor state in the In[0.75]Al[0.25]As barrier band gap, whose energy lies within the In[0.75]Ga[0.25]As/In[0.75]Al[0.25]As conduction band discontinuity. This result is further confirmed through a Poisson-Schroedinger simulation of the two-dimensional electron gas structure.Comment: 17 pages, 6 figures, to be published in J. Vac. Sci. Technol.
    • …
    corecore