20 research outputs found

    Optimum Selection of DNN Model and Framework for Edge Inference

    Get PDF
    This paper describes a methodology to select the optimum combination of deep neuralnetwork and software framework for visual inference on embedded systems. As a first step, benchmarkingis required. In particular, we have benchmarked six popular network models running on four deep learningframeworks implemented on a low-cost embedded platform. Three key performance metrics have beenmeasured and compared with the resulting 24 combinations: accuracy, throughput, and power consumption.Then, application-level specifications come into play. We propose a figure of merit enabling the evaluationof each network/framework pair in terms of relative importance of the aforementioned metrics for a targetedapplication. We prove through numerical analysis and meaningful graphical representations that only areduced subset of the combinations must actually be considered for real deployment. Our approach can beextended to other networks, frameworks, and performance parameters, thus supporting system-level designdecisions in the ever-changing ecosystem of embedded deep learning technology.Ministerio de Economía y Competitividad (TEC2015-66878-C3-1-R)Junta de Andalucía (TIC 2338-2013)European Union Horizon 2020 (Grant 765866

    Impact of Thermal Throttling on Long-Term Visual Inference in a CPU-based Edge Device

    Get PDF
    Many application scenarios of edge visual inference, e.g., robotics or environmental monitoring, eventually require long periods of continuous operation. In such periods, the processor temperature plays a critical role to keep a prescribed frame rate. Particularly, the heavy computational load of convolutional neural networks (CNNs) may lead to thermal throttling and hence performance degradation in few seconds. In this paper, we report and analyze the long-term performance of 80 different cases resulting from running 5 CNN models on 4 software frameworks and 2 operating systems without and with active cooling. This comprehensive study was conducted on a low-cost edge platform, namely Raspberry Pi 4B (RPi4B), under stable indoor conditions. The results show that hysteresis-based active cooling prevented thermal throttling in all cases, thereby improving the throughput up to approximately 90% versus no cooling. Interestingly, the range of fan usage during active cooling varied from 33% to 65%. Given the impact of the fan on the power consumption of the system as a whole, these results stress the importance of a suitable selection of CNN model and software components. To assess the performance in outdoor applications, we integrated an external temperature sensor with the RPi4B and conducted a set of experiments with no active cooling in a wide interval of ambient temperature, ranging from 22 {\deg}C to 36 {\deg}C. Variations up to 27.7% were measured with respect to the maximum throughput achieved in that interval. This demonstrates that ambient temperature is a critical parameter in case active cooling cannot be applied.Comment: 14 pages, 11 figure

    Impact of Thermal Throttling on Long-Term Visual Inference in a CPU-Based Edge Device

    Get PDF
    Many application scenarios of edge visual inference, e.g., robotics or environmental monitoring, eventually require long periods of continuous operation. In such periods, the processor temperature plays a critical role to keep a prescribed frame rate. Particularly, the heavy computational load of convolutional neural networks (CNNs) may lead to thermal throttling and hence performance degradation in few seconds. In this paper, we report and analyze the long-term performance of 80 different cases resulting from running five CNN models on four software frameworks and two operating systems without and with active cooling. This comprehensive study was conducted on a low-cost edge platform, namely Raspberry Pi 4B (RPi4B), under stable indoor conditions. The results show that hysteresis-based active cooling prevented thermal throttling in all cases, thereby improving the throughput up to approximately 90% versus no cooling. Interestingly, the range of fan usage during active cooling varied from 33% to 65%. Given the impact of the fan on the power consumption of the system as a whole, these results stress the importance of a suitable selection of CNN model and software components. To assess the performance in outdoor applications, we integrated an external temperature sensor with the RPi4B and conducted a set of experiments with no active cooling in a wide interval of ambient temperature, ranging from 22 °C to 36 °C. Variations up to 27.7% were measured with respect to the maximum throughput achieved in that interval. This demonstrates that ambient temperature is a critical parameter in case active cooling cannot be appliedPeer reviewe

    PreVIous: A Methodology for Prediction of Visual Inference Performance on IoT Devices

    Full text link
    This paper presents PreVIous, a methodology to predict the performance of convolutional neural networks (CNNs) in terms of throughput and energy consumption on vision-enabled devices for the Internet of Things. CNNs typically constitute a massive computational load for such devices, which are characterized by scarce hardware resources to be shared among multiple concurrent tasks. Therefore, it is critical to select the optimal CNN architecture for a particular hardware platform according to prescribed application requirements. However, the zoo of CNN models is already vast and rapidly growing. To facilitate a suitable selection, we introduce a prediction framework that allows to evaluate the performance of CNNs prior to their actual implementation. The proposed methodology is based on PreVIousNet, a neural network specifically designed to build accurate per-layer performance predictive models. PreVIousNet incorporates the most usual parameters found in state-of-the-art network architectures. The resulting predictive models for inference time and energy have been tested against comprehensive characterizations of seven well-known CNN models running on two different software frameworks and two different embedded platforms. To the best of our knowledge, this is the most extensive study in the literature concerning CNN performance prediction on low-power low-cost devices. The average deviation between predictions and real measurements is remarkably low, ranging from 3% to 10%. This means state-of-the-art modeling accuracy. As an additional asset, the fine-grained a priori analysis provided by PreVIous could also be exploited by neural architecture search engines.Comment: 18 pages. 7 figure

    Performance Assessment of Deep Learning Frameworks through Metrics of CPU Hardware Exploitation on an Embedded Platform

    Get PDF
    In this paper, we analyze heterogeneous performance exhibited by some popular deep learning software frameworks for visual inference on a resource-constrained hardware platform. Benchmarking of Caffe, OpenCV, TensorFlow, and Caffe2 is performed on the same set of convolutional neural networks in terms of instantaneous throughput, power consumption, memory footprint, and CPU utilization. To understand the resulting dissimilar behavior, we thoroughly examine how the resources in the processor are differently exploited by these frameworks. We demonstrate that a strong correlation exists between hardware events occurring in the processor and inference performance. The proposed hardware-aware analysis aims to find limitations and bottlenecks emerging from the joint interaction of frameworks and networks on a particular CPU-based platform. This provides insight into introducing suitable modifications in both types of components to enhance their global performance. It also facilitates the selection of frameworks and networks among a large diversity of these components available these days for visual understanding

    Clonal chromosomal mosaicism and loss of chromosome Y in elderly men increase vulnerability for SARS-CoV-2

    Full text link
    The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, COVID-19) had an estimated overall case fatality ratio of 1.38% (pre-vaccination), being 53% higher in males and increasing exponentially with age. Among 9578 individuals diagnosed with COVID-19 in the SCOURGE study, we found 133 cases (1.42%) with detectable clonal mosaicism for chromosome alterations (mCA) and 226 males (5.08%) with acquired loss of chromosome Y (LOY). Individuals with clonal mosaic events (mCA and/or LOY) showed a 54% increase in the risk of COVID-19 lethality. LOY is associated with transcriptomic biomarkers of immune dysfunction, pro-coagulation activity and cardiovascular risk. Interferon-induced genes involved in the initial immune response to SARS-CoV-2 are also down-regulated in LOY. Thus, mCA and LOY underlie at least part of the sex-biased severity and mortality of COVID-19 in aging patients. Given its potential therapeutic and prognostic relevance, evaluation of clonal mosaicism should be implemented as biomarker of COVID-19 severity in elderly people. Among 9578 individuals diagnosed with COVID-19 in the SCOURGE study, individuals with clonal mosaic events (clonal mosaicism for chromosome alterations and/or loss of chromosome Y) showed an increased risk of COVID-19 lethality

    Contributions to the realization of DNN-based visual inference on embedded systems

    No full text
    This thesis comprises a set of contributions to the state of the art of embedded computer vision systems. CNNs constitute an accurate and flexible approach for artificial vision. They significantly outperform traditional algorithms based on prescribed features. This has prompted the development of a myriad of specific hardware and software components tailored for these neural networks. However, CNNs are memory-hungry and computationally heavy, which notably hinders their integration in embedded devices for field deployments. Therefore, a primary goal of this thesis was to explore system architectures and configurations optimized in terms of power consumption, frame rate, compactness, and cost. In addition, flexibility and programmability have also been two design principles that we have kept in mind throughout the research conducted in this doctoral dissertation. This is why we employed widespread software libraries to endow low-cost low-power embedded commercial platforms with visual inference capabilities. The active development of these libraries will continuously improve the resulting performance from the underlying hardware. The implementation of visual inference on edge devices has been addressed from different perspectives, and a vast set of experimental results have been collected to validate the methodologies introduced. This has been done on diverse embedded hardware platforms (RPi 3B/4B, Odroid XU4, Jetson TX2, etc.), software frameworks (Caffe, TF, OpenCV, TVM, etc.), and CNN models (GoogLeNet, MobileNet, ResNet, etc.). We have also introduced FoMs adapted to the nature of the targeted evaluation in order to support application-level decisions on the basis of meaningful system parameters. A variety of tools and lab equipment have been employed for the comprehensive characterizations performed. From all this work, a major conclusion that can be drawn is that low-cost DNN embedding under real-time operation conditions with moderate-to-high accuracy is currently possible, but the implementation must be thoroughly planned in advanced, system components must be carefully selected, and long battery lifetime should not be expected yet. The procedures proposed in this thesis assist in these tasks and constitute guidelines for future enhanced realizations of embedded vision. Another relevant conclusion is that all abstraction levels, i.e., application, algorithm, software, and hardware, must be jointly considered, and the corresponding performance metrics vertically conveyed during the design, in order to accomplish competitive systems useful for real scenarios

    Visual Inference for IoT Systems: A Practical Approach

    No full text
    This book presents a systematic approach to the implementation of Internet of Things (IoT) devices achieving visual inference through deep neural networks. Practical aspects are covered, with a focus on providing guidelines to optimally select hardware and software components as well as network architectures according to prescribed application requirements. The monograph includes a remarkable set of experimental results and functional procedures supporting the theoretical concepts and methodologies introduced. A case study on animal recognition based on smart camera traps is also presented and thoroughly analyzed. In this case study, different system alternatives are explored and a particular realization is completely developed. Illustrations, numerous plots from simulations and experiments, and supporting information in the form of charts and tables make Visual Inference and IoT Systems: A Practical Approach a clear and detailed guide to the topic. It will be of interest to researchers, industrial practitioners, and graduate students in the fields of computer vision and IoT.Peer reviewe

    Optimum Network/Framework Selection from High-Level Specifications in Embedd

    No full text
    in Advanced Concepts for Intelligent Vision Systems (ACIVS), Poitiers, France, September 2018, ISBN 978-3-030-01448-3,This paper benchmarks 16 combinations of popular Deep Neural Networks for 1000-category image recognition and Deep Learn- ing frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the op- timum network/framework selection for all possible application scenarios de ned in those terms, i.e. weighted balance of the aforementioned pa- rameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design deci- sions in the ever-changing ecosystem of Deep Learning technologyPeer reviewe
    corecore