31 research outputs found

    A model-based design flow for embedded vision applications on heterogeneous architectures

    Get PDF
    The ability to gather information from images is straightforward to human, and one of the principal input to understand external world. Computer vision (CV) is the process to extract such knowledge from the visual domain in an algorithmic fashion. The requested computational power to process these information is very high. Until recently, the only feasible way to meet non-functional requirements like performance was to develop custom hardware, which is costly, time-consuming and can not be reused in a general purpose. The recent introduction of low-power and low-cost heterogeneous embedded boards, in which CPUs are combine with heterogeneous accelerators like GPUs, DSPs and FPGAs, can combine the hardware efficiency needed for non-functional requirements with the flexibility of software development. Embedded vision is the term used to identify the application of the aforementioned CV algorithms applied in the embedded field, which usually requires to satisfy, other than functional requirements, also non-functional requirements such as real-time performance, power, and energy efficiency. Rapid prototyping, early algorithm parametrization, testing, and validation of complex embedded video applications for such heterogeneous architectures is a very challenging task. This thesis presents a comprehensive framework that: 1) Is based on a model-based paradigm. Differently from the standard approaches at the state of the art that require designers to manually model the algorithm in any programming language, the proposed approach allows for a rapid prototyping, algorithm validation and parametrization in a model-based design environment (i.e., Matlab/Simulink). The framework relies on a multi-level design and verification flow by which the high-level model is then semi-automatically refined towards the final automatic synthesis into the target hardware device. 2) Relies on a polyglot parallel programming model. The proposed model combines different programming languages and environments such as C/C++, OpenMP, PThreads, OpenVX, OpenCV, and CUDA to best exploit different levels of parallelism while guaranteeing a semi-automatic customization. 3) Optimizes the application performance and energy efficiency through a novel algorithm for the mapping and scheduling of the application 3 tasks on the heterogeneous computing elements of the device. Such an algorithm, called exclusive earliest finish time (XEFT), takes into consideration the possible multiple implementation of tasks for different computing elements (e.g., a task primitive for CPU and an equivalent parallel implementation for GPU). It introduces and takes advantage of the notion of exclusive overlap between primitives to improve the load balancing. This thesis is the result of three years of research activity, during which all the incremental steps made to compose the framework have been tested on real case studie

    Rapid Prototyping of Embedded Vision Systems: Embedding Computer Vision Applications into Low-Power Heterogeneous Architectures

    Get PDF
    Embedded vision is a disruptive new technology in the vision industry. It is a revolutionary concept with far reaching implications, and it is opening up new applications and shaping the future of entire industries. It is applied in self-driving cars, autonomous vehicles in agriculture, digital dermascopes that help specialists make more accurate diagnoses, among many other unique and cutting-edge applications. The design of such systems gives rise to new challenges for embedded Software developers. Embedded vision applications are characterized by stringent performance constraints to guarantee real-time behaviours and, at the same time, energy constraints to save battery on the mobile platforms. In this paper, we address such challenges by proposing an overall view of the problem and by analysing current solutions. We present our last results on embedded vision design automation over two main aspects: the adoption of the model-based paradigm for the embedded vision rapid prototyping, and the application of heterogeneous programming languages to improve the system performance. The paper presents our recent results on the design of a localization and mapping application combined with image recognition based on deep learning optimized for an NVIDIA Jetson TX2

    Enhancing Performance of Computer Vision Applications on Low-Power Embedded Systems Through Heterogeneous Parallel Programming

    Get PDF
    Enabling computer vision applications on low-power embedded systems gives rise to new challenges for embedded SW developers. Such applications implement different functionalities, like image recognition based on deep learning, simultaneous localization and mapping tasks. They are characterized by stringent performance constraints to guarantee real-time behaviors and, at the same time, energy constraints to save battery on the mobile platform. Even though heterogeneous embedded boards are getting pervasive for their high computational power at low power costs, they need a time consuming customization of the whole application (i.e., mapping of application blocks to CPUGPU processing elements and their synchronization) to efficiently exploit their potentiality. Different languages and environments have been proposed for such an embedded SW customization. Nevertheless, they often find limitations on complex real cases, as their application is mutual exclusive. This paper presents a comprehensive framework that relies on a heterogeneous parallel programming model, which combines OpenMP, PThreads, OpenVX, OpenCV, and CUDA to best exploit different levels of parallelism while guaranteeing a semi-automatic customization. The paper shows how such languages and API platforms have been interfaced, synchronized, and applied to customize an ORBSLAM application for an NVIDIA Jetson TX2 board

    On the Task Mapping and Scheduling for DAG-based Embedded Vision Applications on Heterogeneous Multi/Many-core Architectures

    Get PDF
    Embedded vision applications have stringent performance constraints that must be satisfied when they are run on low-power embedded systems. OpenVX has emerged as the de-facto reference standard to develop such applications. Starting with a DAG representation of the application and by relying on a primitive-based programming model, it allows for automatic system-level optimizations and synthesis of an implementation onto the target heterogeneous multi-core architecture. However, the state-of-the-art algorithm for task mapping and scheduling in OpenVX does not provide the performance necessary for such applications when deployed on embedded multi-/many-core architectures. %does not implement an efficient algorithm task mapping and scheduling onto embedded multi/many-core architectures. Our work addresses this challenge by making the following three contributions. First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread embedded vision systems (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of an embedded vision application where some primitives may have multiple implementations (e.g., for CPU and for GPU), can lead to an imbalance in load amongst heterogeneous computing elements (CEs); thereby, suffering from degraded performance. Third, we propose an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over OpenVX. We present the results on different benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with the NVIDIA image recognition application based on deep-learning

    Data Flow ORB-SLAM for Real-time Performance on Embedded GPU Boards

    Get PDF
    The use of embedded boards on robots, including unmanned aerial and ground vehicles, is increasing thanks to the availability of GPU equipped low-cost embedded boards in the market. Porting algorithms originally designed for desktop CPUs on those boards is not straightforward due to hardware limitations. In this paper, we present how we modified and customized the open source SLAM algorithm ORB-SLAM2 to run in real-time on the NVIDIA Jetson TX2. We adopted a data flow paradigm to process the images, obtaining an efficient CPU/GPU load distribution that results in a processing speed of about 30 frames per second. Quantitative experimental results on four different sequences of the KITTI datasets demonstrate the effectiveness of the proposed approach. The source code of our data flow ORB-SLAM2 algorithm is publicly available on GitHub

    On the Query Strategies for Efficient Online Active Distillation

    Full text link
    Deep Learning (DL) requires lots of time and data, resulting in high computational demands. Recently, researchers employ Active Learning (AL) and online distillation to enhance training efficiency and real-time model adaptation. This paper evaluates a set of query strategies to achieve the best training results. It focuses on Human Pose Estimation (HPE) applications, assessing the impact of selected frames during training using two approaches: a classical offline method and a online evaluation through a continual learning approach employing knowledge distillation, on a popular state-of-the-art HPE dataset. The paper demonstrates the possibility of enabling training at the edge lightweight models, adapting them effectively to new contexts in real-time

    A Framework for the Design and Simulation of Embedded Vision Applications Based on OpenVX and ROS

    Get PDF
    Customizing computer vision applications for embedded systems is a common and widespread problem in the cyber-physical systems community. Such a customization means parametrizing the algorithm by considering the external environment and mapping the Software application to the heterogeneous Hardware resources by satisfying non-functional constraints like performance, power, and energy consumption. This work presents a framework for the design and simulation of embedded vision applications that integrates the OpenVX standard platform with the Robot Operating System (ROS). The paper shows how the framework has been applied to tune the ORB-SLAM application for an NVIDIA Jetson TX2 board by considering different environment contexts and different design constraints

    Parametric Multi-Step Scheme for GPU-Accelerated Graph Decomposition into Strongly Connected Components

    Get PDF
    The problem of decomposing a directed graph into strongly connected components (SCCs) is a fundamental graph problem that is inherently present in many scientific and commercial applications. Clearly, there is a strong need for good high-performance, e.g., GPU-accelerated, algorithms to solve it. Unfortunately, among existing GPU-enabled algorithms to solve the problem, there is none that can be considered the best on every graph, disregarding the graph characteristics. Indeed, the choice of the right and most appropriate algorithm to be used is often left to inexperienced users. In this paper, we introduce a novel parametric multi-step scheme to evaluate existing GPU-accelerated algorithms for SCC decomposition in order to alleviate the burden of the choice and to help the user to identify which combination of existing techniques for SCC decomposition would fit an expected use case the most. We support our scheme with an extensive experimental evaluation that dissects correlations between the internal structure of GPU-based algorithms and their performance on various classes of graphs. The measurements confirm that there is no algorithm that would beat all other algorithms in the decomposition on all of the classes of graphs. Our contribution thus represents an important step towards an ultimate solution of automatically adjusted scheme for the GPU-accelerated SCC decomposition

    Camera- and Viewpoint-Agnostic Evaluation of Axial Postural Abnormalities in People with Parkinson’s Disease through Augmented Human Pose Estimation

    Get PDF
    Axial postural abnormalities (aPA) are common features of Parkinson’s disease (PD) and manifest in over 20% of patients during the course of the disease. aPA form a spectrum of functional trunk misalignment, ranging from a typical Parkinsonian stooped posture to progressively greater degrees of spine deviation. Current research has not yet led to a sufficient understanding of pathophysiology and management of aPA in PD, partially due to lack of agreement on validated, user-friendly, automatic tools for measuring and analysing the differences in the degree of aPA, according to patients’ therapeutic conditions and tasks. In this context, human pose estimation (HPE) software based on deep learning could be a valid support as it automatically extrapolates spatial coordinates of the human skeleton keypoints from images or videos. Nevertheless, standard HPE platforms have two limitations that prevent their adoption in such a clinical practice. First, standard HPE keypoints are inconsistent with the keypoints needed to assess aPA (degrees and fulcrum). Second, aPA assessment either requires advanced RGB-D sensors or, when based on the processing of RGB images, they are most likely sensitive to the adopted camera and to the scene (e.g., sensor–subject distance, lighting, background–subject clothing contrast). This article presents a software that augments the human skeleton extrapolated by state-of-the-art HPE software from RGB pictures with exact bone points for posture evaluation through computer vision post-processing primitives. This article shows the software robustness and accuracy on the processing of 76 RGB images with different resolutions and sensor–subject distances from 55 PD patients with different degrees of anterior and lateral trunk flexion

    Integrating Simulink, OpenVX, and ROS for Model-Based Design of Embedded Vision Applications

    No full text
    International audienceOpenVX is increasingly gaining consensus as standard platform to develop portable, optimized and power-efficient embedded vision applications. Nevertheless, adopting OpenVX for rapid prototyping, early algorithm parametrization and validation of complex embedded applications is a very challenging task. This paper presents a comprehensive framework that integrates Simulink, OpenVX, and ROS for model-based design of embedded vision applications. The framework allows applying Matlab-Simulink for the model-based design, parametrization, and validation of computer vision applications. Then, it allows for the automatic synthesis of the application model into an OpenVX description for the hardware and constraints-aware application tuning. Finally, the methodology allows integrating the OpenVX application with Robot Operating System (ROS), which is the de-facto reference standard for developing robotic software applications. The OpenVX-ROS interface allows co-simulating and parametrizing the application by considering the actual robotic environment and the application reuse in any ROS-compliant system. Experimental results have been conducted with two real case studies: An application for digital image stabilization and the ORB descriptor for simultaneous localization and mapping (SLAM), which have been developed through Simulink and, then, automatically synthesized into OpenVX-VisionWorks code for an NVIDIA Jetson TX2 board
    corecore