32 research outputs found

    Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

    Get PDF
    Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.Comment: 22 pages, 7 Figures, 1 Tabl

    hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

    Full text link
    Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.Comment: 10 pages, 8 figures, TinyML Research Symposium 202

    Electronic Structure and Transition Energies in Polymer–Fullerene Bulk Heterojunctions

    Full text link
    © 2014 American Chemical Society. Photocurrent spectroscopy is used to measure both the charge transfer and exciton optical absorption spectra of various bulk heterojunction organic solar cells. The energy difference between the polymer HOMO energy and the fullerene LUMO energy is obtained from the spectra, along with the disorder energy. Combining information from cells with several different polymers and fullerenes allows measurements of the energy differences between HOMO or LUMO energies for about 10 different polymers and fullerenes, with an estimated uncertainty of 50 meV. Heterojunction band offsets are obtained for the various cells, distinguishing between the excitonic and the single-carrier band offsets. The cell open-circuit voltage is shown to be closely correlated with the interface band gap. The exciton disorder energy is directly correlated to the band-tail disorder and we also consider the effects of exciton thermalization on the charge generation mechanism. The data indicate that an energy offset between the polymer exciton and the charge transfer ground state below about 0.25 eV adversely affects the cell performance, while a HOMO band offset below about 0.2-0.3 eV also degrades cell performance but by a different mechanism

    GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments

    No full text
    Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution.</jats:p
    corecore