32 research outputs found
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference
Efficient machine learning implementations optimized for inference in
hardware have wide-ranging benefits, depending on the application, from lower
inference latency to higher data throughput and reduced energy consumption. Two
popular techniques for reducing computation in neural networks are pruning,
removing insignificant synapses, and quantization, reducing the precision of
the calculations. In this work, we explore the interplay between pruning and
quantization during the training of neural networks for ultra low latency
applications targeting high energy physics use cases. Techniques developed for
this study have potential applications across many other domains. We study
various configurations of pruning during quantization-aware training, which we
term quantization-aware pruning, and the effect of techniques like
regularization, batch normalization, and different pruning schemes on
performance, computational complexity, and information content metrics. We find
that quantization-aware pruning yields more computationally efficient models
than either pruning or quantization alone for our task. Further,
quantization-aware pruning typically performs similar to or better in terms of
computational efficiency compared to other neural architecture search
techniques like Bayesian optimization. Surprisingly, while networks with
different training configurations can have similar performance for the
benchmark application, the information content in the network can vary
significantly, affecting its generalizability.Comment: 22 pages, 7 Figures, 1 Tabl
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Accessible machine learning algorithms, software, and diagnostic tools for
energy-efficient devices and systems are extremely valuable across a broad
range of application domains. In scientific domains, real-time near-sensor
processing can drastically improve experimental design and accelerate
scientific discoveries. To support domain scientists, we have developed hls4ml,
an open-source software-hardware codesign workflow to interpret and translate
machine learning algorithms for implementation with both FPGA and ASIC
technologies. We expand on previous hls4ml work by extending capabilities and
techniques towards low-power implementations and increased usability: new
Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long
pipeline kernels for low power, and new device backends include an ASIC
workflow. Taken together, these and continued efforts in hls4ml will arm a new
generation of domain scientists with accessible, efficient, and powerful tools
for machine-learning-accelerated discovery.Comment: 10 pages, 8 figures, TinyML Research Symposium 202
Electronic Structure and Transition Energies in Polymer–Fullerene Bulk Heterojunctions
© 2014 American Chemical Society. Photocurrent spectroscopy is used to measure both the charge transfer and exciton optical absorption spectra of various bulk heterojunction organic solar cells. The energy difference between the polymer HOMO energy and the fullerene LUMO energy is obtained from the spectra, along with the disorder energy. Combining information from cells with several different polymers and fullerenes allows measurements of the energy differences between HOMO or LUMO energies for about 10 different polymers and fullerenes, with an estimated uncertainty of 50 meV. Heterojunction band offsets are obtained for the various cells, distinguishing between the excitonic and the single-carrier band offsets. The cell open-circuit voltage is shown to be closely correlated with the interface band gap. The exciton disorder energy is directly correlated to the band-tail disorder and we also consider the effects of exciton thermalization on the charge generation mechanism. The data indicate that an energy offset between the polymer exciton and the charge transfer ground state below about 0.25 eV adversely affects the cell performance, while a HOMO band offset below about 0.2-0.3 eV also degrades cell performance but by a different mechanism
Recommended from our members
Synthesis of Porphyrins for Use in Metal-Organic Frameworks (MOFs)
Metal-organic frameworks, crystalline structures consisting of metal ions and organic ligands, are a topic of interest in research. The porous structure of MOFs creates a high surface area, making MOFs useful in the absorption and purification of gases and as a reaction catalyst. To keep the porous structure of MOFs, it is useful to use rigid ligands that will consistently bind with the metal ions in the same manner. This project focused on synthesizing two porphyrins. The aromatic structure of porphyrins provides a stable lattice work for MOFs. Metal ions can bind to the external substituent groups and in the center of the porphyrin ring, allowing porphyrins to form an array of two dimensional and three dimensional MOFs. This report looks at methods used to synthesize TCPP and TPyP
Recommended from our members
Sustainable Movement: Developing a Mobile Environmental Education Curriculum for Rural Schools in Namibia
The national curriculum of Namibia includes environmental education in Grades 1-4 only, despite research that environmental education is more effective in secondary schools. The goal of this project was to assist the EduVentures Trust with the integration of environmental education into rural Namibian secondary schools through the development of interactive SMART technology lessons for the EduMobile project – a mobile classroom truck that travels to rural schools to educate learners about environmental topics. Our assessment of Namibia’s flawed environmental education curriculum revealed that environmental education could be improved through the implementation of hands-on learning – the method preferred by Namibian learners
Recommended from our members
Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.
Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability
GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments
Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution.</jats:p