23 research outputs found

    Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device

    Get PDF
    Currently, most designers face a daunting task to research different design flows and learn the intricacies of specific software from various manufacturers in hardware/software co-design. An urgent need of creating a scalable hardware/software co-design platform has become a key strategic element for developing hardware/software integrated systems. In this paper, we propose a new design flow for building a scalable co-design platform on FPGA-based system-on-chip. We employ an integrated approach to implement a histogram oriented gradients (HOG) and a support vector machine (SVM) classification on a programmable device for pedestrian tracking. Not only was hardware resource analysis reported, but the precision and success rates of pedestrian tracking on nine open access image data sets are also analysed. Finally, our proposed design flow can be used for any real-time image processingrelated products on programmable ZYNQ-based embedded systems, which benefits from a reduced design time and provide a scalable solution for embedded image processing products

    Kodizajn arhitekture i algoritama za lokalizacijumobilnih robota i detekciju prepreka baziranih namodelu

    No full text
    This thesis proposes SoPC (System on a Programmable Chip) architectures for efficient embedding of vison-based localization and obstacle detection tasks in a navigational pipeline on autonomous mobile robots. The obtained results are equivalent or better in comparison to state-ofthe- art. For localization, an efficient hardware architecture that supports EKF-SLAM's local map management with seven-dimensional landmarks in real time is developed. For obstacle detection a novel method of object recognition is proposed - detection by identification framework based on single detection window scale. This framework allows adequate algorithmic precision and execution speeds on embedded hardware platforms.Ova teza bavi se dizajnom SoPC (engl. System on a Programmable Chip) arhitektura i algoritama za efikasnu implementaciju zadataka lokalizacije i detekcije prepreka baziranih na viziji u kontekstu autonomne robotske navigacije. Za lokalizaciju, razvijena je efikasna računarska arhitektura za EKF-SLAM algoritam, koja podržava skladištenje i obradu sedmodimenzionalnih orijentira lokalne mape u realnom vremenu. Za detekciju prepreka je predložena nova metoda prepoznavanja objekata u slici putem prozora detekcije fiksne dimenzije, koja omogućava veću brzinu izvršavanja algoritma detekcije na namenskim računarskim platformama

    Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric Approach

    Get PDF
    A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements. This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced. As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis. The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated. The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems

    Co-design hardware/software of real time vision system on FPGA for obstacle detection

    Get PDF
    La détection, localisation d'obstacles et la reconstruction de carte d'occupation 2D sont des fonctions de base pour un robot navigant dans un environnement intérieure lorsque l'intervention avec les objets se fait dans un environnement encombré. Les solutions fondées sur la vision artificielle et couramment utilisées comme SLAM (simultaneous localization and mapping) ou le flux optique ont tendance a être des calculs intensifs. Ces solutions nécessitent des ressources de calcul puissantes pour répondre à faible vitesse en temps réel aux contraintes. Nous présentons une architecture matérielle pour la détection, localisation d'obstacles et la reconstruction de cartes d'occupation 2D en temps réel. Le système proposé est réalisé en utilisant une architecture de vision sur FPGA (field programmable gates array) et des capteurs d'odométrie pour la détection, localisation des obstacles et la cartographie. De la fusion de ces deux sources d'information complémentaires résulte un modèle amelioré de l'environnement autour des robots. L'architecture proposé est un système à faible coût avec un temps de calcul réduit, un débit d'images élevé, et une faible consommation d'énergieObstacle detection, localization and occupancy map reconstruction are essential abilities for a mobile robot to navigate in an environment. Solutions based on passive monocular vision such as simultaneous localization and mapping (SLAM) or optical flow (OF) require intensive computation. Systems based on these methods often rely on over-sized computation resources to meet real-time constraints. Inverse perspective mapping allows for obstacles detection at a low computational cost under the hypothesis of a flat ground observed during motion. It is thus possible to build an occupancy grid map by integrating obstacle detection over the course of the sensor. In this work we propose hardware/software system for obstacle detection, localization and 2D occupancy map reconstruction in real-time. The proposed system uses a FPGA-based design for vision and proprioceptive sensors for localization. Fusing this information allows for the construction of a simple environment model of the sensor surrounding. The resulting architecture is a low-cost, low-latency, high-throughput and low-power system

    Review: Recent Directions in ECG-FPGA Researches

    Get PDF
    لقد شهدت السنوات القليلة الماضية اهتماماً متزايداً نحو استخدام مصفوفة البوابات المنطقية القابلة للبرمجة FPGA في التطبيقات المختلفة. لقد أدى التقدم الحاصل في مرونة التعامل مع الموارد بالاضافة الى الزيادة في سرعة الاداء وانخفاض الثمن للـ FPGA وكذلك الاستهلاك القليل للطاقة الى هذا الاهتمام المتزايد بالـ FPGA. ان استخدام الـ FPGA في مجالات الطب والصحة يهدف بشكل عام الى استبدال اجهزة المراقبة الطبية كبيرة الحجم وغالية الثمن باخرى أصغر حجماً مع امكانية تصميمها لكي تكون اجهزة محمولة اعتماداً على مرونة التصميم التي يوفرها الـ FPGA. إنصب الاهتمام في العديد من البحوث الحالية على استخدام نظام FPGA لمعالجة الجوانب المتعلقة بإشارة تخطيط القلب وذلك لتوفير التحسينات في الاداء وزيادة السرعة بالاضافة الى أيجاد وإقتراح افكار جديدة لمثل هذه التطبيقات. ان هذا البحث يوفر نظرة عامة عن الاتجاهات الحالية في انظمة ECG-FPGA.The last few years witnessed an increased interest in utilizing field programmable gate array (FPGA) for a variety of applications. This utilizing derived mostly by the advances in the FPGA flexible resource configuration, increased speed, relatively low cost and low energy consumption. The introduction of FPGA in medicine and health care field aim generally to replace costly and usually bigger medical monitoring and diagnostic equipment with much smaller and possibly portable systems based on FPGA that make use of the design flexibility of FPGA. Many recent researches focus on FPGA systems to deal with the well-known yet very important electrocardiogram (ECG) signal aspects to provide acceleration and improvement in the performance as well as finding and proposing new ideas for such implementations. The recent directions in ECG-FPGA are introduced in this paper

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Polymorphic computing abstraction for heterogeneous architectures

    Get PDF
    Integration of multiple computing paradigms onto system on chip (SoC) has pushed the boundaries of design space exploration for hardware architectures and computing system software stack. The heterogeneity of computing styles in SoC has created a new class of architectures referred to as Heterogeneous Architectures. Novel applications developed to exploit the different computing styles are user centric for embedded SoC. Software and hardware designers are faced with several challenges to harness the full potential of heterogeneous architectures. Applications have to execute on more than one compute style to increase overall SoC resource utilization. The implication of such an abstraction is that application threads need to be polymorphic. Operating system layer is thus faced with the problem of scheduling polymorphic threads. Resource allocation is also an important problem to be dealt by the OS. Morphism evolution of application threads is constrained by the availability of heterogeneous computing resources. Traditional design optimization goals such as computational power and lower energy per computation are inadequate to satisfy user centric application resource needs. Resource allocation decisions at application layer need to permeate to the architectural layer to avoid conflicting demands which may affect energy-delay characteristics of application threads. We propose Polymorphic computing abstraction as a unified computing model for heterogeneous architectures to address the above issues. Simulation environment for polymorphic applications is developed and evaluated under various scheduling strategies to determine the effectiveness of polymorphism abstraction on resource allocation. User satisfaction model is also developed to complement polymorphism and used for optimization of resource utilization at application and network layer of embedded systems

    A Comparison of FPGA and GPGPU Designs for Bayesian Occupancy Filters

    Get PDF
    Grid-based perception techniques in the automotive sector based on fusing information from different sensors and their robust perceptions of the environment are proliferating in the industry. However, one of the main drawbacks of these techniques is the traditionally prohibitive, high computing performance that is required for embedded automotive systems. In this work, the capabilities of new computing architectures that embed these algorithms are assessed in a real car. The paper compares two ad hoc optimized designs of the Bayesian Occupancy Filter; one for General Purpose Graphics Processing Unit (GPGPU) and the other for Field-Programmable Gate Array (FPGA). The resulting implementations are compared in terms of development effort, accuracy and performance, using datasets from a realistic simulator and from a real automated vehicle.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with the National Projects TCAP-AUTO (RTC-2015-3942-4) and NAVEGASE (DPI2014-53525-C3-1-R)

    Heterogeneous Architectures For Parallel Acceleration

    Get PDF
    To enable a new generation of digital computing applications, the greatest challenge is to provide a better level of energy efficiency (intended as the performance that a system can provide within a certain power budget) without giving up a systems's flexibility. This constraint applies to digital system across all scales, starting from ultra-low power implanted devices up to datacenters for high-performance computing and for the "cloud". In this thesis, we show that architectural heterogeneity is the key to provide this efficiency and to respond to many of the challenges of tomorrow's computer architecture - and at the same time we show methodologies to introduce it with little or no loss in terms of flexibility. In particular, we show that heterogeneity can be employed to tackle the "walls" that impede further development of new computing applications: the utilization wall, i.e. the impossibility to keep all transistors on in deeply integrated chips, and the "data deluge", i.e. the amount of data to be processed that is scaling up much faster than the computing performance and efficiency. We introduce a methodology to improve heterogeneous design exploration of tightly coupled clusters; moreover we propose a fractal heterogeneity architecture that is a parallel accelerator for low-power sensor nodes, and is itself internally heterogeneous thanks to an heterogeneous coprocessor for brain-inspired computing. This platform, which is silicon-proven, can lead to more than 100x improvement in terms of energy efficiency with respect to typical computing nodes used within the same domain, enabling the application of complex algorithms, vastly more performance-hungry than the current state-of-the-art in the ULP computing domain

    Modelling and characterisation of distributed hardware acceleration

    Get PDF
    Hardware acceleration has become more commonly utilised in networked computing systems. The growing complexity of applications mean that traditional CPU architectures can no longer meet stringent latency constraints. Alternative computing architectures such as GPUs and FPGAs are increasingly available, along with simpler, more software-like development flows. The work presented in this thesis characterises the overheads associated with these accelerator architectures. A holistic view encompassing both computation and communication latency must be considered. Experimental results obtained through this work show that networkattached accelerators scale better than server-hosted deployments, and that host ingestion overheads are comparable to network traversal times in some cases. Along with the choice of processing platforms, it is becoming more important to consider how workloads are partitioned and where in the network tasks are being performed. Manual allocation and evaluation of tasks to network nodes does not scale with network and workload complexity. A mathematical formulation of this problem is presented within this thesis that takes into account all relevant performance metrics. Unlike other works, this model takes into account growing hardware heterogeneity and workload complexity, and is generalisable to a range of scenarios. This model can be used in an optimisation that generates lower cost results with latency performance close to theoretical maximums compared to naive placement approaches. With the mathematical formulation and experimental results that characterise hardware accelerator overheads, the work presented in this thesis can be used to make informed design decisions about both where to allocate tasks and deploy accelerators in the network, and the associated costs
    corecore