59 research outputs found

    Run-Time Efficient RNN Compression for Inference on Edge Devices

    Full text link
    Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. This scheme divides the weight matrix into two parts - an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper sub-vector has "richer" features while the lower-sub vector has "constrained features". HMD can compress RNNs by a factor of 2-4x while having a faster run-time than pruning (Zhu &Gupta, 2017) and retaining more model accuracy than matrix factorization (Grachev et al., 2017). We evaluate this technique on 5 benchmarks spanning 3 different applications, illustrating its generality in the domain of edge computing.Comment: Published at 4th edition of Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications at International Symposium of Computer Architecture 2019, Phoenix, Arizona (https://www.emc2-workshop.com/isca-19) colocated with ISCA 201

    Model Compression via Generalized Kronecker Product Decomposition

    Get PDF
    Modern convolutional neural network (CNN) architectures, despite their superiority in solving various problems, are generally too large to be deployed on resource constrained edge devices. In practice, this limits many real-world applications by requiring them to off-load computations to cloud-based systems. Such a limitation introduces concerns related to privacy as well as bandwidth capabilities. The design of efficient models as well as automated compression methodologies such as quantization, pruning, knowledge distillation and tensor decomposition have been proposed to allow models to operate in such resource-constrained environments. In particular, tensor decomposition approaches have gained interest in recent years as they can achieve a wide variety of compression rates while maintaining efficient memory access patterns. However, they typically cause significant reduction in model performance on classification tasks after compression. To address this challenge, a new method that improves performance of decomposition-based model compression has been designed and tested on a variety of classification tasks. Specifically, we compress convolutional layers by generalizing the Kronecker product decomposition to apply to multidimensional tensors, leading to the Generalized Kronecker Product Decomposition (GKPD). Our approach yields a plug-and-play module that can be used as a drop-in replacement for any convolutional layer to simultaneously reduce its memory usage and number of floating-point-operations. Experimental results for image classification on CIFAR-10 and ImageNet datasets using ResNet, MobileNetv2 and SeNet architectures as well as action recognition on HMDB-51 using I3D-ResNet50 substantiate the effectiveness of our proposed approach. We find that GKPD outperforms state-of-the-art decomposition methods including Tensor-Train and Tensor-Ring as well as other relevant compression methods such as pruning and knowledge distillation. The proposed GKPD method serves as a means of deploying state-of-the-art CNN models without sacrificing significant accuracy degradation. Furthermore, the capability of utilizing GKPD as a drop-in replacement for convolutional layers allows its use for CNN model compression with minimal development time, in contrast to approaches such as efficient architecture design

    The Tensor Networks Anthology: Simulation techniques for many-body quantum lattice systems

    Full text link
    We present a compendium of numerical simulation techniques, based on tensor network methods, aiming to address problems of many-body quantum mechanics on a classical computer. The core setting of this anthology are lattice problems in low spatial dimension at finite size, a physical scenario where tensor network methods, both Density Matrix Renormalization Group and beyond, have long proven to be winning strategies. Here we explore in detail the numerical frameworks and methods employed to deal with low-dimension physical setups, from a computational physics perspective. We focus on symmetries and closed-system simulations in arbitrary boundary conditions, while discussing the numerical data structures and linear algebra manipulation routines involved, which form the core libraries of any tensor network code. At a higher level, we put the spotlight on loop-free network geometries, discussing their advantages, and presenting in detail algorithms to simulate low-energy equilibrium states. Accompanied by discussions of data structures, numerical techniques and performance, this anthology serves as a programmer's companion, as well as a self-contained introduction and review of the basic and selected advanced concepts in tensor networks, including examples of their applications.Comment: 115 pages, 56 figure

    Machine Learning for Microcontroller-Class Hardware -- A Review

    Full text link
    The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa

    Stato dell’arte del machine learning su microcontrollori

    Get PDF
    L'intelligenza artificiale ed i sistemi embedded sono due tecnologie che stanno acquisendo sempre più importanza al giorno d'oggi. Questo elaborato vuole essere una review dello stato dell'arte del machine learning su microcontrollori, che prende il nome di TinyML. In particolare vengono considerati i software e le librerie utilizzate, i metodi e gli algoritmi più diffusi e le possibili applicazioni

    Advances in Hydrogen Energy

    Get PDF
    This book, which is a reprint of articles published in the Special Issue "Advances in Hydrogen Energy" in Energies, seeks to contribute to disseminating the most recent advancements in the field of hydrogen energy. It does so by presenting scientific works from around the world covering both modeling and experimental analysis. The focus is placed on research covering all aspects of the hydrogen energy, from production to storage and final use, including the development of other easy to transport and versatile hydrogen-based energy carriers via the power-to-x (PtX) route, such as ammonia and methanol.Hydrogen energy research and development has attracted growing attention as one of the key solutions for clean future energy systems. In order to reduce greenhouse gas emissions, governments across the world are developing ambitious policies to support hydrogen technology, and an increasing level of funding has been allocated for projects of research, development, and demonstration of these technologies. At the same time, the private sector is capitalizing on the opportunity with larger investments in hydrogen technology solutions.While intense research activities have been dedicated to this field, several issues require further research prior to achieving full commercialization of hydrogen technology solutions. This book addresses some of these issues by presenting detailed models to optimize design strategies and operating conditions for the entire hydrogen value chain, covering production via electrolysis, storage and use in different types of fuel cells and in different forms of energy carriers

    Current distribution measurements and modeling of mass transfer in polymer electrolyte fuel cells

    Get PDF
    The polymer electrolyte fuel cell (PEFC) is considered as an attractive option to produce electric power in many applications ranging from a few watt portable up to several kilowatt automotive applications. The advantage of the PEFC in these applications stems from its high efficiency, low emissions, silent operation and possible low production costs in the future. However, the main factor hindering the market penetration of PEFC applications is the present high production cost of the cell. To allow lower costs for the PEFC, the cell area has to be used efficiently in order to minimize the material usage. This requires the maximization of the cell performance by enhancing the current production at low potential losses. At high current densities, mass transfer losses become the dominating loss mechanism. The mass transfer losses usually produce uneven current production throughout the active area of the cell. The local current production can be studied by experimental and computational methods. For the experimental characterization of the local current production, two different measurement system based on segmented current collectors have been constructed. The other is for a small PEFC operating with natural convection and the other is for a large PEFC operating with forced convection. In addition to the experimental methods, two different theoretical PEFC models have been developed, the other for the free-breathing PEFC and the other for the forced convection PEFC. The current distribution studies were conducted for the free-breathing PEFC in order to determine the feasibility of using natural convection as an air supply method for the cathode reaction at different cell temperatures and ambient conditions. It was observed that the cell performance is highly dependent on the operating conditions and that the current distribution is uneven in the most cases. The current distribution measurements conducted with the large PEFC were used mainly for the model validation purposes. It was shown that under certain operating conditions the current distribution was uniform and thus a one-dimensional PEFC model could be used. The results showed that two-phase and non-isothermal conditions are likely to exist when a PEFC is operated at high current densities and with well humidified gases.reviewe

    The theory and practice of resonance Raman spectroscopy

    Get PDF
    The thesis proposes a new method of obtaining molecular structural data from resonance Raman spectroscopy: the experimental, theoretical and numerical aspects of the method are presented, and some of the results are included. The first chapter describes Raman and resonance Raman scattering and introduces the instrumentation, methods and procedures used in obtaining Raman data. The sources of error in measuring band intensities and excitation profiles are analysed, and corrections to the errors are proposed. Some original experimental results are presented in Appendix 2 for the purpose of illustrating the technique and the sources of errors. Chapter 2 describes an advanced theoretical model of secondary radiation, and its interpretation in terms of Raman and fluorescence radiation; in Chapter 3 the model is applied to relating resonance Raman data to molecular structure, by using the physical assumptions of the model and the corresponding mathematical approximations. The result is a set of equations relating microscopic parameters describing the molecular structure to the macroscopic quantities to be measured experimentally. A new mathematical procedure for solving the equation set obtained at the end of Chapter 3 is proposed in Chapter 4; the numerical and computational implementation are described in this chapter and the computer programs used in practical applications are presented in appendices 4 and 5. The results of applying the new method are presented in Chapter 5 in the form of tables containing the calculated parameters and of graphs comparing the experimental and the simulated excitation profiles; chemical systems belonging to three different geometries have been investigated
    • …
    corecore