618 research outputs found

    Evaluación de desempeño de redes convolucionales sobre arquitecturas heterogéneas para aplicaciones en robótica autónoma

    Get PDF
    Humanoid robots find application in human-robot interaction tasks. However, despite their capabilities, their sequential computing system limits the execution of computationally expensive algorithms such as convolutional neural networks, which have demonstrated good performance in recognition tasks. As an alternative to sequential computing units, Field-Programmable Gate Arrays and Graphics Processing Units have a high degree of parallelism and low power consumption. This study aims to improve the visual perception of a humanoid robot called NAO using these embedded systems running a convolutional neural network. The methodology adopted here is based on image acquisition and transmission using simulation software: Webots and Choreographe. In each embedded system, an object recognition stage is performed using commercial convolutional neural network acceleration frameworks. Xilinx® Ultra96™, Intel® Cyclone® V-SoC and NVIDIA® Jetson™ TX2 cards were used, and Tinier-YOLO, AlexNet, Inception-V1 and Inception V3 transfer-learning networks were executed. Real-time metrics were obtained when Inception V1, Inception V3 transfer-learning and AlexNet were run on the Ultra96 and Jetson TX2 cards, with frame rates between 28 and 30 frames per second. The results demonstrated that the use of these embedded systems and convolutional neural networks can provide humanoid robots such as NAO with greater visual recognition in tasks that require high accuracy and autonomy.Los robots humanoides encuentran aplicación en tareas de interacción humano-robot. A pesar de sus capacidades, su sistema de computación secuencial limita la ejecución de algoritmos computacionalmente costosos, como las redes neuronales convolucionales, que han demostrado buen rendimiento en tareas de reconocimiento. Como alternativa a unidades de cómputo secuencial se encuentran los Field Programmable Gate Arrays y las Graphics Processing Unit, que tienen un alto grado de paralelismo y bajo consumo de energía. Este trabajo tuvo como objetivo mejorar la percepción visual del robot humanoide NAO utilizando estos sistemas embebidos que ejecutan una red neuronal convolucional. El trabajo se basó en la adquisición y transmisión de la imagen usando herramientas de simulación como Webots y Choreographe. Posteriormente, en cada sistema embebido, se realizó una etapa de reconocimiento del objeto utilizando frameworks de aceleración comerciales de redes neuronales convolucionales. Luego se utilizaron las tarjetas Xilinx Ultra96, Intel Cyclone V-SoC y Nvidia Jetson TX2; después fueron ejecutadas las redes Tinier-Yolo, Alexnet, Inception V1 y Inception V3 transfer-learning. Se obtuvieron métricas en tiempo real cuando Inception V1, Inception V3 transfer-learning y AlexNet fueron ejecutadas sobre la Ultra96 y Jetson TX2, teniendo como intervalo entre 28 y 30 cuadros por segundo. Los resultados demostraron que el uso de estos sistemas embebidos y redes neuronales convolucionales puede otorgarles a robots humanoides, como NAO, mayor reconocimiento visual en tareas que requieren alta precisión y autonomía. &nbsp

    Using System-on-a-Programmable-Chip Technology to Design Embedded Systems

    Get PDF
    This paper describes the tools, techniques, and devices used to design embedded products with system–on-a-chip (SoC) type solutions using a large Field Programmable Gate Array (FPGA) with an internal processor core. This new FPGA-based approach is called system-on-a-programmable-chip (SoPC ). The performance tradeoffs present in SoPC systems is compared to more traditional design approaches. Commercial devices, processor cores, and CAD tool flows are described. The issues in SoPC hardware/software design tradeoffs are examined and three example SoPC designs are presented as case studies

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

    FPGA-based module for SURF extraction

    Get PDF
    We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm's most computationally expensive process, the interest point detection. The module's overall performance is evaluated and compared to CPU and GPU based solutions. Results show that the embedded module achieves comparable disctinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots

    Dracon: An Open-Hardware Based Platform for Single-Chip Low-Cost Reconfigurable IoT Devices

    Get PDF
    The development of devices for the Internet of Things (IoT) requires the rapid prototyping of different hardware configurations. In this paper, a modular hardware platform allowing to prototype, test and even implement IoT appliances on low-cost reconfigurable devices is presented. The proposed platform, named Dracon, includes a Z80-clone microprocessor, up to 64 KB of RAM, and 256 inputs/outputs (I/Os). These I/Os can be used to connect additional co-processors within the same FPGA, external co-processors, communications modules, sensors and actuators. Dracon also includes as default peripherals a UART for programming and accessing the microprocessor, a Real Time Clock, and an Interrupt Timer. The use of an 8-bit microprocessor allows the use of the internal memory of the reconfigurable device as program memory, thereby, enabling the implementation of a complete IoT device within a single low-cost chip. Indeed, results using a Spartan 7 FPGA show that it is possible to implement Dracon with only 1515 6-input LUTs while operating at a maximum frequency of 80 MHz, which results in a better trade-off in terms of area and performance than other less powerful and less versatile alternatives in the literature. Moreover, the presented platform allows the development of embedded software applications independently of the selected FPGA device, enabling rapid prototyping and implementations on devices from different manufacturers.Junta de AndaluciaEuropean Commission B-TIC-588-UGR2
    corecore