1,960 research outputs found
A Framework for the Design and Simulation of Embedded Vision Applications Based on OpenVX and ROS
Customizing computer vision applications for embedded systems is a common and widespread problem in the cyber-physical systems community. Such a customization means parametrizing the algorithm by considering the external environment and mapping the Software application to the heterogeneous Hardware resources by satisfying non-functional constraints like performance, power, and energy consumption. This work presents a framework for the design and simulation of embedded vision applications that integrates the OpenVX standard platform with the Robot Operating System (ROS). The paper shows how the framework has been applied to tune the ORB-SLAM application for an NVIDIA Jetson TX2 board by considering different environment contexts and different design constraints
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Extending OpenVX for Model-based Design of Embedded Vision Applications
Developing computer vision applications for lowpower heterogeneous systems is increasingly gaining interest in the embedded systems community. Even more interesting is the tuning of such embedded software for the target architecture when this is driven by multiple constraints (e.g., performance, peak power, energy consumption). Indeed, developers frequently run into system-level inefficiencies and bottlenecks that can not be quickly addressed by traditional methods. In this context OpenVX has been proposed as the standard platform to develop portable, optimized and powerefficient applications for vision algorithms targeting embedded systems. Nevertheless, adopting OpenVX for rapid prototyping, early algorithm parametrization and validation of complex embedded applications is a very challenging task. This paper presents a methodology to integrate a model-based design environment to OpenVX. The methodology allows applying Matlab/Simulink for the model-based design, parametrization, and validation of computer vision applications. Then, it allows for the automatic synthesis of the application model into an OpenVX description for the hardware and constraints-aware application tuning. Experimental results have been conducted with an application for digital image stabilization developed through Simulink and, then, automatically synthesized into OpenVX-VisionWorks code for an NVIDIA Jetson TX1 boar
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
Integrating Simulink, OpenVX, and ROS for Model-Based Design of Embedded Vision Applications
OpenVX is increasingly gaining consensus as standard platform to develop portable, optimized and power-efficient embedded vision applications. Nevertheless, adopting OpenVX for rapid prototyping, early algorithm parametrization and validation of complex embedded applications is a very challenging task. This paper presents a comprehensive framework that integrates Simulink, OpenVX, and ROS for model-based design of embedded vision applications. The framework allows applying Matlab-Simulink for the model-based design, parametrization, and validation of computer vision applications. Then, it allows for the automatic synthesis of the application model into an OpenVX description for the hardware and constraints-aware application tuning. Finally, the methodology allows integrating the OpenVX application with Robot Operating System (ROS), which is the de-facto reference standard for developing robotic software applications. The OpenVX-ROS interface allows co-simulating and parametrizing the application by considering the actual robotic environment and the application reuse in any ROS-compliant system. Experimental results have been conducted with two real case studies: An application for digital image stabilization and the ORB descriptor for simultaneous localization and mapping (SLAM), which have been developed through Simulink and, then, automatically synthesized into OpenVX-VisionWorks code for an NVIDIA Jetson TX2 boar
Enhancing Performance of Computer Vision Applications on Low-Power Embedded Systems Through Heterogeneous Parallel Programming
Enabling computer vision applications on low-power embedded systems gives rise to new challenges for embedded SW developers. Such applications implement different functionalities, like image recognition based on deep learning, simultaneous localization and mapping tasks. They are characterized by stringent performance constraints to guarantee real-time behaviors and, at the same time, energy constraints to save battery on the mobile platform. Even though heterogeneous embedded boards are getting pervasive for their high computational power at low power costs, they need a time consuming customization of the whole application (i.e., mapping of application blocks to CPUGPU processing elements and their synchronization) to efficiently exploit their potentiality. Different languages and environments have been proposed for such an embedded SW customization. Nevertheless, they often find limitations on complex real cases, as their application is mutual exclusive. This paper presents a comprehensive framework that relies on a heterogeneous parallel programming model, which combines OpenMP, PThreads, OpenVX, OpenCV, and CUDA to best exploit different levels of parallelism while guaranteeing a semi-automatic customization. The paper shows how such languages and API platforms have been interfaced, synchronized, and applied to customize an ORBSLAM application for an NVIDIA Jetson TX2 board
- …