983 research outputs found

    Deep Learning-Based Multiple Object Visual Tracking on Embedded System for IoT and Mobile Edge Computing Applications

    Get PDF
    Compute and memory demands of state-of-the-art deep learning methods are still a shortcoming that must be addressed to make them useful at IoT end-nodes. In particular, recent results depict a hopeful prospect for image processing using Convolutional Neural Netwoks, CNNs, but the gap between software and hardware implementations is already considerable for IoT and mobile edge computing applications due to their high power consumption. This proposal performs low-power and real time deep learning-based multiple object visual tracking implemented on an NVIDIA Jetson TX2 development kit. It includes a camera and wireless connection capability and it is battery powered for mobile and outdoor applications. A collection of representative sequences captured with the on-board camera, dETRUSC video dataset, is used to exemplify the performance of the proposed algorithm and to facilitate benchmarking. The results in terms of power consumption and frame rate demonstrate the feasibility of deep learning algorithms on embedded platforms although more effort to joint algorithm and hardware design of CNNs is needed.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

    Easy implementation of advanced tomography algorithms using the ASTRA toolbox with Spot operators

    Get PDF
    Mathematical scripting languages are commonly used to develop new tomographic reconstruction algorithms. For large experimental datasets, high performance parallel (GPU) implementations are essential, requiring a re-implementation of the algorithm using a language that is closer to the computing hardware. In this paper, we introduce a new Matlab interface to the ASTRA toolbox, a high performance toolbox for building tomographic reconstruction algorithms. By exposing the ASTRA linear tomography operators through a standard Matlab matrix syntax, existing and new reconstruction algorithms implemented in Matlab can now be applied directly to large experimental datasets. This is achieved by using the Spot toolbox, which wraps external code for linear operations into Matlab objects that can be used as matrices. We provide a series of examples that demonstrate how this Spot operator can be used in combination with existing algorithms implemented in Matlab and how it can be used for rapid development of new algorithms, resulting in direct applicability to large-scale experimental datasets

    Computation scheduling in neural network inference on embedded hardware

    Get PDF
    Cílem této práce je prozkoumat state- of-the-art způsoby detekce objektů po- mocí konvolučních neuronových sítí, využívaných v oblasti autonomního řízení. Proto aby běh na vestavěných systémech byl dostatečně optimalizo- ván, je nutné rozumět struktuře sítě a způsobu, jak se provádí její výpočet pomocí konkrétní knihovny. Hlavním cílem této práce je porovnat něko- lik dostupných knihoven pro oblast strojového učení a popsat nezdokumen- tovanou vnitřní architekturu knihovny TensorFlow, aby bylo možné na základě těchto znalostí upravovat vykonávané části kódu za účelem lepšího rozvrho- vání jednotlivých procesů. Aby bylo možné porovnávat výsledky budoucích optimalizací na cílové platformě NVI- DIA Jetson Tegra X2, je představen jednoduchý benchmark a je popsán postup, jak vyčítat spotřebu energie a tepelný profil čipů na desce.This thesis aims to examine the state-of-the-art solution of using con- volutional neural networks to address the problem of object detection, during the autonomous driving. The effective execution of these solutions involves an in-depth understanding of used frame- work architectures. The main goal of the thesis is to compare several ma- chine learning frameworks and provide a comprehensive description of the nondocumented internal architecture of the TensorFlow machine learning framework to allow future researches to introduce modifications regarding scheduling mechanisms. To properly evaluate future modifications on the target platform NVIDIA Tegra X2, the thesis introduces the benchmark and provides an instruction how to read power consumption and temperature of board components
    corecore