Search CORE

16 research outputs found

FPGA accelerated model predictive control for autonomous driving

Author: Shengbo Eben Li
Shulin Zeng
Xingheng Jia
Yu Wang
Yunfei Li
Publication venue: Tsinghua University Press
Publication date: 01/05/2022
Field of study

Purpose – The purpose of this paper is to reduce the difficulty of model predictive control (MPC) deployment on FPGA so that researchers can make better use of FPGA technology for academic research. Design/methodology/approach – In this paper, the MPC algorithm is written into FPGA by combining hardware with software. Experiments have verified this method. Findings – This paper implements a ZYNQ-based design method, which could significantly reduce the difficulty of development. The comparison with the CPU solution results proves that FPGA has a significant acceleration effect on the solution of MPC through the method. Research limitations implications – Due to the limitation of practical conditions, this paper cannot carry out a hardware-in-the-loop experiment for the time being, instead of an open-loop experiment. Originality value – This paper proposes a new design method to deploy the MPC algorithm to the FPGA, reducing the development difficulty of the algorithm implementation on FPGA. It greatly facilitates researchers in the field of autonomous driving to carry out FPGA algorithm hardware acceleration research

Directory of Open Access Journals

Heterogeneous architecture to process swarm optimization algorithms

Author: Alfonso-Morales Wilfredo
Caicedo-Bravo Eduardo F.
Dávila-Guzmán Maria A.
Publication venue: 'Instituto Tecnologico Metropolitano (ITM)'
Publication date: 01/01/2014
Field of study

Desde años recientes, el paralelismo hace parte de la arquitectura de las computadoras personales al incluir unidades de co-procesamiento como las unidades de procesamiento gráfico, para conformar así una arquitectura heterogénea. Este artículo presenta la implementación de algoritmos de enjambres sobre esta arquitectura para resolver problemas de optimización de funciones, destacando su estructura inherentemente paralela y sus propiedades de control distribuido. En estos algoritmos se paralelizan los individuos de la población y las dimensiones del problema gracias a la granuralidad del sistema de procesamiento, que además proporciona una baja latencia de comunicaciones entre los individuos debido al procesamiento embebido. Para evaluar las potencialidades de los algoritmos de enjambres sobre la plataforma heterogénea, son implementados dos de ellos: el algoritmo de enjambre de partículas y el algoritmo de enjambre de bacterias. Se utiliza la aceleración como métrica para contrastar los algoritmos en la arquitectura heterogénea compuesta por una GPU NVIDIA GTX480 y una unidad de procesamiento secuencial, donde el algoritmo de enjambre de partículas obtiene una aceleración de hasta 36,82x y el algoritmo de enjambre de bacterias logra una aceleración de hasta 9,26x. Además, se evalúa el efecto al incrementar el tamaño en las poblaciones donde la aceleración es significativamente diferenciable pero con riesgos en la calidad de las soluciones.Since few years ago, the parallel processing has been embedded in personal computers by including co-processing units as the graphics processing units resulting in a heterogeneous platform. This paper presents the implementation of swarm algorithms on this platform to solve several functions from optimization problems, where they highlight their inherent parallel processing and distributed control features. In the swarm algorithms, each individual and dimension problem are parallelized by the granularity of the processing system which also offer low communication latency between individuals through the embedded processing. To evaluate the potential of swarm algorithms on graphics processing units we have implemented two of them: the particle swarm optimization algorithm and the bacterial foraging optimization algorithm. The algorithms’ performance is measured using the acceleration where they are contrasted between a typical sequential processing platform and the NVIDIA GeForce GTX480 heterogeneous platform; the results show that the particle swarm algorithm obtained up to 36.82x and the bacterial foraging swarm algorithm obtained up to 9.26x. Finally, the effect to increase the size of the population is evaluated where we show both the dispersion and the quality of the solutions are decreased despite of high acceleration performance since the initial distribution of the individuals can converge to local optimal solution

Portal de Revistas Academicas del ITM (Institución Universitaria adscrita al Municipio de Medellín)

Crossref

Directory of Open Access Journals

Repositorio Institucional ITM

DIALNET

Asynchronous inter-core communication in the inferior olive simulator for the intel SCC

Author: Pantelopoulos Andreas
Παντελόπουλος Ανδρέας
Publication venue
Publication date: 09/12/2014
Field of study

DSpace at NTUA

Design and application of reconfigurable circuits and systems

Author: Cheung Peter
Publication venue: Electrical & Electronic Engineering, Imperial College London
Publication date: 01/12/2015
Field of study

Open Acces

Spiral - Imperial College Digital Repository

Molecular Dynamics Simulation of Iron — A Review

Author: Chui CP
Liu WQ
Xu YB
Zhou Y
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2015
Field of study

published_or_final_versio

HKU Scholars Hub

Metodología para el planeamiento automático de rutas en vehículos aéreos no tripulados usando algoritmos bio-inspirados sobre sistemas e mbebidos

Author: Góez Sánchez Germán David
Publication venue: 'Instituto Tecnologico Metropolitano (ITM)'
Publication date: 01/01/2016
Field of study

Este trabajo propone un método para el planeamiento de rutas on-line usando técnicas de optimización bio-inspirada con el fin de ser implementado sobre vehículos aéreos no tripulados, el cualplanee la ruta sin necesidad de intervención de un controlador en tierra. En el desarrollo del trabajo se evaluaron los optimizadores enjambre de partículas (PSO) y búsqueda Cuckoo (Ck) junto con el planificador de ruta propuesto. La evaluación del método consistió en ejecutar el planificador de ruta usando como base la técnica de optimización enjambre de partículas, comparándola con la ejecución del método usando la optimización por búsqueda Cuckoo. Los experimentos se realizaron sobre cinco rutas diferentes, cinco veces por ruta, lo que permitió comprobar la funcionalidad del método. Los resultados mostraron que el planificador de ruta tuvo mejores resultados al recorrer una menor distancia cuando su optimizador fue el PSO mostrando ser más estable, encontrando la ruta más fácilmente. En cuanto a tiempo de ejecución, los resultados mostraron que el planificador de ruta necesito un menor tiempo de ejecución cuando se usó el optimizador Ck, siendo de un 89% menor que la misma ejecución del planificador con PSO. Además se implementó el planificador de ruta sobre tres sistemas embebidos basados en microcontroladores, con arquitecturas ARM Cortex M0+, M4 y ColdFire V1de la familia Flexis de 32-bits. Los resultados comprueban que el método propuesto es lo suficientemente versátil y con baja demanda computacionalrealizando el planeamiento de ruta sobre los microcontroladores, y en el caso del ARM Cortex M4, entregando nuevos puntos de avance en la ruta en un tiempo inferior a seis segundos.Demostrando que la implementación de este tipo de metodologías es completamente viable, permitiendoque aeronaves no tripuladas naveguen de forma segura sobre zonas desconocidas, respetando las aerovías comerciales, obteniendo el máximo provecho al espacio aéreoThis work proposes a method to plan online routes using bio inspired optimization techniques to be implemented on unmanned aerial vehicles. The aim of this method is to plan the route without the need for a ground controller. Particle swarm optimization (PSO) and Cuckoo search (CK), together with the proposed route planner, were evaluated. The method executed the route planner using the PSO technique and compared it to the method using Cuckoo search optimization. Experiments were performed on 5 different routes, 5 times per route, allowing the performance verification of this method. Results showed that the route planner using PSO had better results when going a shorter distance, it was more stable and it could easily find the route. As per the execution time, results showed that the route planner required a shorter execution time using the CK optimizer, 89% less than the same execution using the PSO. Besides, the route planner was implemented on three embedded systems based on micro controllers, with ARM Cortex M0+, M4 and ColdFire V1 (Flexis 32-bits) architectures. Results prove that the proposed method is versatile enough and has low computational demand when planning the route on the microcontrollers. For the ARM Cortex M4, it delivers new advance points on the route in a time shorter than 6 seconds. This work shows the viability of implementing this type of methodology, allowing unmanned aerial vehicles to safely navigate on unknown areas, respecting commercial airways and obtaining maximum benefit from the aerial spaceMagister en Automatización y Contro

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional ITM

Implementació OpenCL per a FPGA

Author: Nieto Taló Guillem
Publication venue: Universitat Politècnica de Catalunya
Publication date: 24/01/2013
Field of study

En el marc d'aquest projecte es vol portar a terme un conjunt d'eines que permetin executar un programa escrit en OpenCL sobre una FPGA, permetent disminuir el temps d'execuci ó i el consum energètic. Per a aconseguir-ho s'hauran de desenvolupar programes que permetin adaptar el programa original a un que faci ús de la FPGA.S'ha desenvolupat un conjunt d'eines per aconseguir executar codis escrits en OpenCL sobre FPGAs amb microprocessadors Microblaze

UPCommons. Portal del coneixement obert de la UPC

Dynamically reconfigurable management of energy, performance, and accuracy applied to digital signal, image, and video Processing Applications

Author: Llamocca Obregon Daniel Rolando
Publication venue: UNM Digital Repository
Publication date: 05/07/2012
Field of study

There is strong interest in the development of dynamically reconfigurable systems that can meet real-time constraints in energy/power-performance-accuracy (EPA/PPA). In this dissertation, I introduce a framework for implementing dynamically reconfigurable digital signal, image, and video processing systems. The basic idea is to first generate a collection of Pareto-optimal realizations in the EPA/PPA space. Dynamic EPA/PPA management is then achieved by selecting the Pareto-optimal implementations that can meet the real-time constraints. The systems are then demonstrated using Dynamic Partial Reconfiguration (DPR) and dynamic frequency control on FPGAs. The framework is demonstrated on: i) a dynamic pixel processor, ii) a dynamically reconfigurable 1-D digital filtering architecture, and iii) a dynamically reconfigurable 2-D separable digital filtering system. Efficient implementations of the pixel processor are based on the use of look-up tables and local-multiplexes to minimize FPGA resources. For the pixel-processor, different realizations are generated based on the number of input bits, the number of cores, the number of output bits, and the frequency of operation. For each parameters combination, there is a different pixel-processor realization. Pareto-optimal realizations are selected based on measurements of energy per frame, PSNR accuracy, and performance in terms of frames per second. Dynamic EPA/PPA management is demonstrated for a sequential list of real-time constraints by selecting optimal realizations and implementing using DPR and dynamic frequency control. Efficient FPGA implementations for the 1-D and 2-D FIR filters are based on the use a distributed arithmetic technique. Different realizations are generated by varying the number of coefficients, coefficient bitwidth, and output bitwidth. Pareto-optimal realizations are selected in the EPA space. Dynamic EPA management is demonstrated on the application of real-time EPA constraints on a digital video. The results suggest that the general framework can be applied to a variety of digital signal, image, and video processing systems. It is based on the use of offline-processing that is used to determine the Pareto-optimal realizations. Real-time constraints are met by selecting Pareto-optimal realizations pre-loaded in memory that are then implemented efficiently using DPR and/or dynamic frequency control

Application de techniques parcimonieuses et hiérarchiques en reconnaissance de la parole

Author: Brodeur Simon
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2013
Field of study

Les systèmes de reconnaissance de la parole sont fondamentalement dérivés des domaines du traitement et de la modélisation statistique des signaux. Depuis quelques années, d'importantes innovations de domaines connexes comme le traitement d'image et les neurosciences computationnelles tardent toutefois à améliorer la performance des systèmes actuels de reconnaissance de parole. La revue de la littérature a suggéré qu'un système de reconnaissance vocale intégrant les aspects de hiérarchie, parcimonie et grandes dimensions joindrait les avantages de chacun. L'objectif général est de comprendre comment l'intégration de tous ces aspects permettrait d'améliorer la robustesse aux bruits additifs d'un système de reconnaissance de la parole. La base de données TI46 (mots isolés, faible-vocabulaire) est utilisée pour effectuer l'apprentissage non-supervisé et les tests de classification. Les différents bruits additifs proviennent de la base de données NOISEX-92, et permettent d'évaluer la robustesse en conditions de bruit réalistes. L'extraction de caractéristiques dans le système proposé est effectuée par des projections linéaires successives sur des bases, permettant de couvrir de plus en plus de contexte temporel et spectral. Diverses méthodes de seuillage permettent de produire une représentation multi-échelle, binaire et parcimonieuse de la parole. Au niveau du dictionnaire de bases, l'apprentissage non-supervisé permet sous certaines conditions l'obtention de bases qui reflètent des caractéristiques phonétiques et syllabiques de la parole, donc visant une représentation par objets d'un signal. L'algorithme d'analyse en composantes indépendantes (ICA) s'est démontré mieux adapté à extraire de telles bases, principalement à cause du critère de réduction de redondance. Les analyses théoriques et expérimentales ont montré comment la parcimonie peut contourner les problèmes de discrimination des distances et d'estimation des densités de probabilité dans des espaces à grandes dimensions. Il est observé qu'un espace de caractéristiques parcimonieux à grandes dimensions peut définir un espace de paramètres (p.ex. modèle statistique) de mêmes propriétés. Ceci réduit la disparité entre les représentations de l'étage d'extraction des caractéristiques et celles de l'étage de classification. De plus, l'étage d'extraction des caractéristiques peut favoriser une réduction de la complexité de l'étage de classification. Un simple classificateur linéaire peut venir compléter un modèle de Markov caché (HMM), joignant une capacité de discrimination accrue à la polyvalence d'une segmentation en états d'un signal. Les résultats montrent que l'architecture développée offr de meilleurs taux de reconnaissance en conditions propres et bruités comparativement à une architecture conventionnelle utilisant les coefficients cepstraux (MFCC) et une machine à vecteurs de support (SVM) comme classificateur discriminant. Contrairement aux techniques de codage de la parole où la transformation doit être inversible, la reconstruction n'est pas importante en reconnaissance de la parole. Cet aspect a justifié la possibilité de réduire considérablement la complexité des espaces de caractéristiques et de paramètres, sans toutefois diminuer le pouvoir de discrimination et la robustesse

Savoirs UdeS

Towards Power- and Energy-Efficient Datacenters

Author: Hsu Chang-Hong
Publication venue
Publication date
Field of study

As the Internet evolves, cloud computing is now a dominant form of computation in modern lives. Warehouse-scale computers (WSCs), or datacenters, comprising the foundation of this cloud-centric web have been able to deliver satisfactory performance to both the Internet companies and the customers. With the increased focus and popularity of the cloud, however, datacenter loads rise and grow rapidly, and Internet companies are in need of boosted computing capacity to serve such demand. Unfortunately, power and energy are often the major limiting factors prohibiting datacenter growth: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. This dissertation aims to investigate the issues of power and energy usage in a modern datacenter environment. We identify the source of power and energy inefficiency at three levels in a modern datacenter environment and provides insights and solutions to address each of these problems, aiming to prepare datacenters for critical future growth. We start at the datacenter-level and find that the peak provisioning and improper service placement in multi-level power delivery infrastructures fragment the power budget inside production datacenters, degrading the compute capacity the existing infrastructure can support. We find that the heterogeneity among datacenter workloads is key to address this issue and design systematic methods to reduce the fragmentation and improve the utilization of the power budget. This dissertation then narrow the focus to examine the energy usage of individual servers running cloud workloads. Especially, we examine the power management mechanisms employed in these servers and find that the coarse time granularity of these mechanisms is one critical factor that leads to excessive energy consumption. We propose an intelligent and low overhead solution on top of the emerging finer granularity voltage/frequency boosting circuit to effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost, improving energy efficiency without sacrificing the quality of services. The final focus of this dissertation takes a further step to investigate how using a fundamentally more efficient computing substrate, field programmable gate arrays (FPGAs), benefit datacenter power and energy efficiency. Different from other types of hardware accelerations, FPGAs can be reconfigured on-the-fly to provide fine-grain control over hardware resource allocation and presents a unique set of challenges for optimal workload scheduling and resource allocation. We aim to design a set coordinated algorithms to manage these two key factors simultaneously and fully explore the benefit of deploying FPGAs in the highly varying cloud environment.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144043/1/hsuch_1.pd

Deep Blue Documents at the University of Michigan