11,638 research outputs found

    FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices

    Full text link
    Deep neural networks show great potential as solutions to many sensing application problems, but their excessive resource demand slows down execution time, pausing a serious impediment to deployment on low-end devices. To address this challenge, recent literature focused on compressing neural network size to improve performance. We show that changing neural network size does not proportionally affect performance attributes of interest, such as execution time. Rather, extreme run-time nonlinearities exist over the network configuration space. Hence, we propose a novel framework, called FastDeepIoT, that uncovers the non-linear relation between neural network structure and execution time, then exploits that understanding to find network configurations that significantly improve the trade-off between execution time and accuracy on mobile and embedded devices. FastDeepIoT makes two key contributions. First, FastDeepIoT automatically learns an accurate and highly interpretable execution time model for deep neural networks on the target device. This is done without prior knowledge of either the hardware specifications or the detailed implementation of the used deep learning library. Second, FastDeepIoT informs a compression algorithm how to minimize execution time on the profiled device without impacting accuracy. We evaluate FastDeepIoT using three different sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus. FastDeepIoT further reduces the neural network execution time by 48%48\% to 78%78\% and energy consumption by 37%37\% to 69%69\% compared with the state-of-the-art compression algorithms.Comment: Accepted by SenSys '1

    FPGA-Based Bandwidth Selection for Kernel Density Estimation Using High Level Synthesis Approach

    Full text link
    FPGA technology can offer significantly hi\-gher performance at much lower power consumption than is available from CPUs and GPUs in many computational problems. Unfortunately, programming for FPGA (using ha\-rdware description languages, HDL) is a difficult and not-trivial task and is not intuitive for C/C++/Java programmers. To bring the gap between programming effectiveness and difficulty the High Level Synthesis (HLS) approach is promoting by main FPGA vendors. Nowadays, time-intensive calculations are mainly performed on GPU/CPU architectures, but can also be successfully performed using HLS approach. In the paper we implement a bandwidth selection algorithm for kernel density estimation (KDE) using HLS and show techniques which were used to optimize the final FPGA implementation. We are also going to show that FPGA speedups, comparing to highly optimized CPU and GPU implementations, are quite substantial. Moreover, power consumption for FPGA devices is usually much less than typical power consumption of the present CPUs and GPUs.Comment: 23 pages, 6 figures, extended version of initial pape

    FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels

    Get PDF
    Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

    Positioning in time and space: cost-effective exterior orientation for airborne archaeological photographs

    Get PDF
    Since manned, airborne aerial reconnaissance for archaeological purposes is often characterised by more-or-less random photographing of archaeological features on the Earth, the exact position and orientation of the camera during image acquisition becomes very important in an effective inventorying and interpretation workflow of these aerial photographs. Although the positioning is generally achieved by simultaneously logging the flight path or directly recording the camera's position with a GNSS receiver, this approach does not allow to record the necessary roll, pitch and yaw angles of the camera. The latter are essential elements for the complete exterior orientation of the camera, which allows ā€“ together with the inner orientation of the camera ā€“ to accurately define the portion of the Earth recorded in the photograph. This paper proposes a cost-effective, accurate and precise GNSS/IMU solution (image position: 2.5 m and orientation: 2Ā°, both at 1Ļƒ) to record all essential exterior orientation parameters for the direct georeferencing of the images. After the introduction of the utilised hardware, this paper presents the developed software that allows recording and estimating these parameters. Furthermore, this direct georeferencing information can be embedded into the image's metadata. Subsequently, the first results of the estimation of the mounting calibration (i.e. the misalignment between the camera and GNSS/IMU coordinate frame) are provided. Furthermore, a comparison with a dedicated commercial photographic GNSS/IMU solution will prove the superiority of the introduced solution. Finally, an outlook on future tests and improvements finalises this article

    Promoting Public Health and Safety: A Predictive Modeling Software Analysis on Perceived Road Fatality Contributory Factors

    Get PDF
    Extensive literature search was conducted to computationally analyze the relationship between key perceived road fatality factors and public health impacts, in terms of mortality and morbidity. Heterogeneous sources of data on road fatality 1970-2005 and that based on interview questionnaire on European road driversā€™ perception were sourced. Computational analysis was performed on these data using the Multilayer Perceptron model within the dtreg predictive modeling software. Driver factors had the highest relative significance. Drivers played significant role as causative agents of road accidents. A good degree of correlation was also observed when compared with results obtained by previous researchers. Sweden, UK, Finland, Denmark, Germany, France, Netherlands, and Austria, where road safety targets were set and EU targets adopted, experienced a faster and sharper reduction of road fatalities. However, Belgium, Ireland, Italy, Greece and Portugal experienced slow, but little reduction in cases of road fatalities. Spain experienced an increase in road fatalities possibly due to road fatalities enhancing factors. Estonia, Slovenia, Cyprus, Hungry, Czech Republic, Slovakia and Poland experienced a fluctuating but decreasing trend. Enforcement of road safety principles and regulations are needed to decrease the incidences of fatal accidents. Adoption of the EU target of -50% reductions of fatalities in all countries will help promote public health and safety

    Smart Grid Technologies in Europe: An Overview

    Get PDF
    The old electricity network infrastructure has proven to be inadequate, with respect to modern challenges such as alternative energy sources, electricity demand and energy saving policies. Moreover, Information and Communication Technologies (ICT) seem to have reached an adequate level of reliability and flexibility in order to support a new concept of electricity networkā€”the smart grid. In this work, we will analyse the state-of-the-art of smart grids, in their technical, management, security, and optimization aspects. We will also provide a brief overview of the regulatory aspects involved in the development of a smart grid, mainly from the viewpoint of the European Unio

    Prototype system development for wireless vehicle speed monitoring

    Get PDF
    Vehicle speed monitoring and management of the associated data in an intelligent and efficient way is an important issue in modern transportation system in order to reduce road accidents. The aim of this work is to develop an automatic wireless system for monitoring vehicle speed on the road, identify a speeding vehicle and imposing penalty for the speeding offenders. In this work, a prototype system has been developed in a laboratory environment to generate random speed data using a mechanical wheel, measure the speed data with a Shimmer wireless sensor and transfer the data wirelessly to a client computer for further analysis. Software has been developed using a Java based socket programming technique to monitor the vehicle speed in a server computer and to send the data associated with a speeding vehicle to a remotely placed client computer. The graphical user interface (GUI) can visually display the speed of a vehicle at any particular time. The functionality of the software has been tested by simulating different traffic scenarios with low and high speed limits (40 and 60 km/hr respectively). To do that a high or low speed limit can be set in the GUI. The mechanical wheel is run at different speeds and the GUI continuously displays the speed. If the vehicle speed is higher than the set speed limit for the road, the system automatically detects it and generates a report with the time of speeding, vehicle number, vehicle speed etc. to be saved in the client computer in order to take further necessary actions for the speeding offender

    High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Get PDF
    This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms

    A Multi-GPU Programming Library for Real-Time Applications

    Full text link
    We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
    • ā€¦
    corecore