160 research outputs found

    Hardware-Aware Affordance Detection for Application in Portable Embedded Systems

    Get PDF
    Affordance detection in computer vision allows segmenting an object into parts according to functions that those parts afford. Most solutions for affordance detection are developed in robotics using deep learning architectures that require substantial computing power. Therefore, these approaches are not convenient for application in embedded systems with limited resources. For instance, computer vision is used in smart prosthetic limbs, and in this context, affordance detection could be employed to determine the graspable segments of an object, which is a critical information for selecting a grasping strategy. This work proposes an affordance detection strategy based on hardware-aware deep learning solutions. Experimental results confirmed that the proposed solution achieves comparable accuracy with respect to the state-of-the-art approaches. In addition, the model was implemented on real-time embedded devices obtaining a high FPS rate, with limited power consumption. Finally, the experimental assessment in realistic conditions demonstrated that the developed method is robust and reliable. As a major outcome, the paper proposes and characterizes the first complete embedded solution for affordance detection in embedded devices. Such a solution could be used to substantially improve computer vision based prosthesis control but it is also highly relevant for other applications (e.g., resource-constrained robotic systems)

    Deep learning in edge: evaluation of models and frameworks in ARM architecture

    Get PDF
    The boom and popularization of edge devices have molded its market due to stiff compe tition that provides better functionalities at low energy costs. The ARM architecture has been unanimously unopposed in the huge market segment of smartphones and still makes a presence beyond that: in drones, surveillance systems, cars, and robots. Also, it has been used successfully for the development of solutions for chains that supply food, fuel, and other services. Up until recently, ARM did not show much promise for high-level compu tation, i.e., thanks to its limited RISC instruction set, it was considered power efficient but weak in performance compared to x86 architecture. However, most recent advancements in ARM architecture pivoted that inflection point up thanks to the introduction of embed ded GPUs with DMA into LPDDR memory boards. Since this development in boards such as NVIDIA TK1, NVIDIA Jetson TX1, and NVIDIA TX2, perhaps it finally be came feasible to study and perform more challenging parallel and distributed workloads directly on a RISC-based architecture. On the other hand, the novelty of this technology poses a fundamental question of whether these boards are gaining a meaningful ratio be tween processing power and power consumption over conventional architectures or if they are bound to have reached their limitations. This work explores the Parallel Processing of Deep Learning on embedded GPUs of NVIDIA Jetson TX2 to evaluate the question above comprehensively. Thus, it uses 4 ARM boards, with 2 Deep Learning frameworks, 7 CNN models, and one medium-sized dataset combined into six board settings to con duct experiments. The experiments were conducted under similar environments, all built from the source. Altogether, the experiments ran for a total of 4,804 hours and revealed a slight advantage for MxNet on GPU-reliant training and a PyTorch overall advantage in total execution time and power, but especially for CPU-only executions. The experi ments also showed that the NVIDIA Jetson TX2 already makes feasible some complex workloads directly on its SoC

    Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

    Get PDF
    Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2× slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3× with cuDNN and above 10× with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning

    BRAINSTACK – A Platform for Artificial Intelligence & Machine Learning Collaborative Experiments on a Nano-Satellite

    Get PDF
    Space missions have become more ambitious with exploration targets growing ever distant while simultaneously requiring larger guidance and communication budgets. These conflicting desires of distance and control drive the need for in-situ intelligent decision making to reduce communication and control limitations. While ground based research on Artificial Intelligence and Machine Learning (AI/ML) software modules has grown exponentially, the capacity to experimentally validate such software modules in space in a rapid and inexpensive format has not. To this end, the Nano Orbital Workshop (NOW) group at NASA Ames Research Center is performing flight evaluation tests of ‘commercially’ available bleeding-edge computational platforms via what is programmatically referred to as the BrainStack on the TechEdSat (TES-n) flight series. Processors selected as part of the BrainStack are of ideal size, packaging, and power consumption for easy integration into a cube satellite structure. These experiments have included the evaluation of small, high-performance GPUs and, more recently, neuromorphic processors in LEO operations. Additionally, it is planned to measure the radiation environment these processors experience to understand any degradation or computational artifacts caused by long term space radiation exposure on these novel architectures. This evolving flexible and collaborative environment involving various research teams across NASA and other organizations is intended to be a convenient orbital test platform from which many anticipated future space automation applications may be initially tested

    Deep models optimization on embedded devices to improve the orientation estimation task at sea

    Get PDF
    The environmental monitoring task has greatly benefited from the improvements achieved in the robotics field. The enhancement of navigation and control algorithms, together with the use of performing, small and low-cost sensors, allows in fact to reduce the implementation costs while improving the system reliability. This is strongly supported by the developments of embedded hardware, smart computing devices able to collect and process data in real-time and in low-resource settings. Following the results obtained by DOES, this work aims at putting another step towards its deployment in live scenarios: we propose a study on the performances of DOES tested on embedded systems, using lighter backbone architectures and model optimization techniques
    • …
    corecore