1,854 research outputs found

    FFT-Based Deep Learning Deployment in Embedded Systems

    Full text link
    Deep learning has delivered its powerfulness in many application domains, especially in image and speech recognition. As the backbone of deep learning, deep neural networks (DNNs) consist of multiple layers of various types with hundreds to thousands of neurons. Embedded platforms are now becoming essential for deep learning deployment due to their portability, versatility, and energy efficiency. The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage. Researchers have investigated on reducing DNN model size with negligible accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, making our approach distinguished from existing approaches. We develop the training and inference algorithms based on FFT as the computing kernel and deploy the FFT-based inference model on embedded platforms achieving extraordinary processing speed.Comment: Design, Automation, and Test in Europe (DATE) For source code, please contact Mahdi Nazemi at <[email protected]

    Models multi-tasca basats en xarxes neuronals d’aprenentage profund per al desplegament en sistemes encastats i en temps real

    Get PDF
    Multitask Learning (MTL) was conceived as an approach to improve the generalization ability of machine learning models. When applied to neural networks, multitask models take advantage of sharing resources for reducing the total inference time, memory footprint and model size. We propose MTL as a way to speed up deep learning models for applications in which multiple tasks need to be solved simultaneously, which is particularly useful in embedded, real-time systems such as the ones found in autonomous cars or UAVs. In order to study this approach, we apply MTL to a Computer Vision problem in which both Object Detection and Semantic Segmentation tasks are solved based on the Single Shot Multibox Detector and Fully Convolutional Networks with skip connections respectively, using a ResNet-50 as the base network. We train multitask models for two different datasets, Pascal VOC, which is used to validate the decisions made, and a combination of datasets with aerial view images captured from UAVs. Finally, we analyse the challenges that appear during the process of training multitask networks and try to overcome them. However, these hinder the capacity of our multitask models to reach the performance of the best single-task models trained without the limitations imposed by applying MTL. Nevertheless, multitask networks benefit from sharing resources and are 1.6x faster, lighter and use less memory compared to deploying the single-task models in parallel, which turns essential when running them on a Jetson TX1 SoC as the parallel approach does not fit into memory. We conclude that MTL has the potential to give superior performance as far as the object detection and semantic segmentation tasks are concerned in exchange of a more complex training process that requires overcoming challenges not present in the training of single-task models

    Edge Device Deployment of Multi-Tasking Network for Self-Driving Operations

    Full text link
    A safe and robust autonomous driving system relies on accurate perception of the environment for application-oriented scenarios. This paper proposes deployment of the three most crucial tasks (i.e., object detection, drivable area segmentation and lane detection tasks) on embedded system for self-driving operations. To achieve this research objective, multi-tasking network is utilized with a simple encoder-decoder architecture. Comprehensive and extensive comparisons for two models based on different backbone networks are performed. All training experiments are performed on server while Nvidia Jetson Xavier NX is chosen as deployment device.Comment: arXiv admin note: text overlap with arXiv:1908.08926 by other author

    Advances in Sentiment Analysis in Deep Learning Models and Techniques

    Get PDF
    The article investigates the advantages, disadvantages, and areas of research that need more exploration regarding deep learning architectures used in sentiment analysis. These architectures let models learn complex language features from data without explicit feature engineering, changing sentiment analysis. The models' capacity to capture long-range dependencies has improved their context and nuanced expression interpretation, especially in long or metaphorical texts. Deep learning sentiment analysis algorithms have improved, yet they still face obstacles. The complexity of these models raises ethical questions about bias and transparency. They also require huge, annotated datasets and computational resources, which limits their use in resource-constrained contexts. Adopting deep learning models requires balancing performance and practicality. Explore critical deep learning sentiment analysis research gaps. Cross-domain and cross-lingual sentiment analysis requires context- and language-specific models. Textual and non-textual multimodal sentiment analysis offers untapped potential for complex sentiment interpretation. Responsible AI deployment requires model interpretability, robustness against adversarial assaults, and domain consistency. Finally, deep learning and sentiment analysis have changed our knowledge of human emotion. Accuracy and contextual comprehension have improved, but model transparency, data prerequisites, and practical applicability remain issues. Overcoming these restrictions and exploring research gaps will enable responsible sentiment analysis AI innovation

    A multi-microcontroller-based hardware for deploying Tiny machine learning model

    Get PDF
    The tiny machine learning (TinyML) has been considered to applied on the edge devices where the resource-constrained micro-controller units (MCUs) were used. Finding a good platform to deploy the TinyML effectively is very crucial. The paper aims to propose a multiple micro-controller hardware platform for productively running the TinyML model. The proposed hardware consists of two dual-core MCUs. The first MCU is utilized for acquiring and processing input data, while the second is responsible for executing the trained TinyML network. Two MCUs communicate to each other using the universal asynchronous receiver-transmitter (UART) protocol. The multi-tasking programming technique is mainly applied on the first MCU to optimize the pre-processing new data. A three-phase motors faults classification TinyML model was deployed on the proposed system to evaluate the effectiveness. The experimental results prove that our proposed hardware platform was improved 34.8% the total inference time including pre-processing data of the proposed TinyML model in comparing with single micro-controller hardware platform
    • …
    corecore