34 research outputs found

    Worst Case Execution Time Analysis for Synthesized Hardware

    Get PDF
    Abstract -We propose a hardware performance estimation flow for fast design space exploration, based on worst-case execution time analysis algorithms for software analysis. Test cases on some real-world applications show that our flow provides a tight upper bound of the execution time, and many useful hints to the designer

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults

    Herramienta de software para el desarrollo de descripciones de hardware utilizando VHDL a partir de modelos gráficos

    Get PDF
    With the purpose of making the process of hardware designquicker and easier, a number of investigations have been conducted concerning the automatic generation of VHDL descriptions based on graphic models. In this document, a software tool that generates VHDL descriptions based on either ASM diagrams or components diagrams will be presented. The characteristics of this tool allow aVHDL description already created to be reused like a newcomponent. Additionally, the information contained in an ASM diagram can be reused as part of new ASM diagrams. The tool is used to generate the VHDL description of 15 well-known circuits from both their component diagrams and their ASM diagrams, obtaining an equivalent description in 53% of the cases and approximate descriptions in the remaining cases.Con el objetivo de agilizar y facilitar el proceso de diseño de hardware se han realizado investigaciones relacionadas con la generación automática de descripciones VHDL a partir de modelos gráficos. En este documento se describe una herramienta de software que permite obtener la descripción VHDL a partir de un diagrama ASM o de un diagrama de componentes. Esta herramienta tiene la característica de que pueden reutilizarse descripciones ya creadas como nuevos componentes y la información contenida en diagramas ASM como parte de nuevos diagramas ASM. La herramienta es utilizada para generar la descripción VHDL de 15 circuitos comunes a partir de sus diagramas de componentes y ASMs, obteniéndose en un 53% de los casos descripciones equivalentes y en el resto de casos aproximadas

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    Implementation and Characterization of Mixed-Signal Neuromorphic ASICs

    Get PDF
    Accelerated neuromorphic hardware allows the emulation of spiking neural networks with a high speed-up factor compared to classical computer simulation approaches. However, realizing a high degree of versatility and configurability in the implemented models is challenging. In this thesis, we present two mixed-signal ASICs that improve upon previous architectures by augmenting the versatility of the modeled synapses and neurons. In the first part, we present the integration of an analog multi-compartment neuron model into the Multi-Compartment Chip. We characterize the properties of this neuron model and describe methods to compensate for deviations from ideal behavior introduced by the physical implementation. The implemented features of the multi-compartment neurons are demonstrated with a compact prototype setup. In the second part, the integration of a general-purpose microprocessor with analog models of neurons and synapses is described. This allows to define learning rules that go beyond spike-timing dependent plasticity in software without decreasing the speed-up of the underlying network emulation. In the third part, the importance of testability and pre-tapeout verification is discussed and exemplified by the design process of both chips

    High-level synthesis of VLSI circuits

    Get PDF

    The 1991 3rd NASA Symposium on VLSI Design

    Get PDF
    Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

    The hArtes Tool Chain

    Get PDF
    This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

    Flexible Hardware Architectures for Retinal Image Analysis

    Get PDF
    RÉSUMÉ Des millions de personnes autour du monde sont touchées par le diabète. Plusieurs complications oculaires telle que la rétinopathie diabétique sont causées par le diabète, ce qui peut conduire à une perte de vision irréversible ou même la cécité si elles ne sont pas traitées. Des examens oculaires complets et réguliers par les ophtalmologues sont nécessaires pour une détection précoce des maladies et pour permettre leur traitement. Comme solution préventive, un protocole de dépistage impliquant l'utilisation d'images numériques du fond de l'œil a été adopté. Cela permet aux ophtalmologistes de surveiller les changements sur la rétine pour détecter toute présence d'une maladie oculaire. Cette solution a permis d'obtenir des examens réguliers, même pour les populations des régions éloignées et défavorisées. Avec la grande quantité d'images rétiniennes obtenues, des techniques automatisées pour les traiter sont devenues indispensables. Les techniques automatisées de détection des maladies des yeux ont été largement abordées par la communauté scientifique. Les techniques développées ont atteint un haut niveau de maturité, ce qui a permis entre autre le déploiement de solutions en télémédecine. Dans cette thèse, nous abordons le problème du traitement de volumes élevés d'images rétiniennes dans un temps raisonnable dans un contexte de dépistage en télémédecine. Ceci est requis pour permettre l'utilisation pratique des techniques développées dans le contexte clinique. Dans cette thèse, nous nous concentrons sur deux étapes du pipeline de traitement des images rétiniennes. La première étape est l'évaluation de la qualité de l'image rétinienne. La deuxième étape est la segmentation des vaisseaux sanguins rétiniens. L’évaluation de la qualité des images rétinienne après acquisition est une tâche primordiale au bon fonctionnement de tout système de traitement automatique des images de la rétine. Le rôle de cette étape est de classifier les images acquises selon leurs qualités, et demander une nouvelle acquisition en cas d’image de mauvaise qualité. Plusieurs algorithmes pour évaluer la qualité des images rétiniennes ont été proposés dans la littérature. Cependant, même si l'accélération de cette tâche est requise en particulier pour permettre la création de systèmes mobiles de capture d'images rétiniennes, ce sujet n'a pas encore été abordé dans la littérature. Dans cette thèse, nous ciblons un algorithme qui calcule les caractéristiques des images pour permettre leur classification en mauvaise, moyenne ou bonne qualité. Nous avons identifié le calcul des caractéristiques de l'image comme une tâche répétitive qui nécessite une accélération. Nous nous sommes intéressés plus particulièrement à l’accélération de l’algorithme d’encodage à longueur de séquence (Run-Length Matrix – RLM). Nous avons proposé une première implémentation complètement logicielle mise en œuvre sous forme d’un système embarqué basé sur la technologie Zynq de Xilinx. Pour accélérer le calcul des caractéristiques, nous avons conçu un co-processeur capable de calculer les caractéristiques en parallèle implémenté sur la logique programmable du FPGA Zynq. Nous avons obtenu une accélération de 30,1 × pour la tâche de calcul des caractéristiques de l’algorithme RLM par rapport à son implémentation logicielle sur la plateforme Zynq. La segmentation des vaisseaux sanguins rétiniens est une tâche clé dans le pipeline du traitement des images de la rétine. Les vaisseaux sanguins et leurs caractéristiques sont de bons indicateurs de la santé de la rétine. En outre, leur segmentation peut également aider à segmenter les lésions rouges, indicatrices de la rétinopathie diabétique. Plusieurs techniques de segmentation des vaisseaux sanguins rétiniens ont été proposées dans la littérature. Des architectures matérielles ont également été proposées pour accélérer certaines de ces techniques. Les architectures existantes manquent de performances et de flexibilité de programmation, notamment pour les images de haute résolution. Dans cette thèse, nous nous sommes intéressés à deux techniques de segmentation du réseau vasculaire rétinien, la technique du filtrage adapté et la technique des opérateurs de ligne. La technique de filtrage adapté a été ciblée principalement en raison de sa popularité. Pour cette technique, nous avons proposé deux architectures différentes, une architecture matérielle personnalisée mise en œuvre sur FPGA et une architecture basée sur un ASIP. L'architecture matérielle personnalisée a été optimisée en termes de surface et de débit de traitement pour obtenir des performances supérieures par rapport aux implémentations existantes dans la littérature. Cette implémentation est plus efficace que toutes les implémentations existantes en termes de débit. Pour l'architecture basée sur un processeur à jeu d’instructions spécialisé (Application-Specific Instruction-set Processor – ASIP), nous avons identifié deux goulets d'étranglement liés à l'accès aux données et à la complexité des calculs de l'algorithme. Nous avons conçu des instructions spécifiques ajoutées au chemin de données du processeur. L'ASIP a été rendu 7.7 × plus rapide par rapport à son architecture de base. La deuxième technique pour la segmentation des vaisseaux sanguins est l'algorithme détecteur de ligne multi-échelle (Multi-Scale Ligne Detector – MSLD). L'algorithme MSLD est choisi en raison de ses performances et de son potentiel à détecter les petits vaisseaux sanguins. Cependant, l'algorithme fonctionne en multi-échelle, ce qui rend l’algorithme gourmand en mémoire. Pour résoudre ce problème et permettre l'accélération de son exécution, nous avons proposé un algorithme efficace en terme de mémoire, conçu et implémenté sur FPGA. L'architecture proposée a réduit de façon drastique les exigences de l’algorithme en terme de mémoire en réutilisant les calculs et la co-conception logicielle/matérielle. Les deux architectures matérielles proposées pour la segmentation du réseau vasculaire rétinien ont été rendues flexibles pour pouvoir traiter des images de basse et de haute résolution. Ceci a été réalisé par le développement d'un compilateur spécifique capable de générer une description HDL de bas niveau de l'algorithme à partir d'un ensemble de paramètres. Le compilateur nous a permis d’optimiser les performances et le temps de développement. Dans cette thèse, nous avons introduit deux architectures qui sont, au meilleur de nos connaissances, les seules capables de traiter des images à la fois de basse et de haute résolution.----------ABSTRACT Millions of people all around the world are affected by diabetes. Several ocular complications such as diabetic retinopathy are caused by diabetes, which can lead to irreversible vision loss or even blindness if not treated. Regular comprehensive eye exams by eye doctors are required to detect the diseases at earlier stages and permit their treatment. As a preventing solution, a screening protocol involving the use of digital fundus images was adopted. This allows eye doctors to monitor changes in the retina to detect any presence of eye disease. This solution made regular examinations widely available, even to populations in remote and underserved areas. With the resulting large amount of retinal images, automated techniques to process them are required. Automated eye detection techniques are largely addressed by the research community, and now they reached a high level of maturity, which allows the deployment of telemedicine solutions. In this thesis, we are addressing the problem of processing a high volume of retinal images in a reasonable time. This is mandatory to allow the practical use of the developed techniques in a clinical context. In this thesis, we focus on two steps of the retinal image pipeline. The first step is the retinal image quality assessment. The second step is the retinal blood vessel segmentation. The evaluation of the quality of the retinal images after acquisition is a primary task for the proper functioning of any automated retinal image processing system. The role of this step is to classify the acquired images according to their quality, which will allow an automated system to request a new acquisition in case of poor quality image. Several algorithms to evaluate the quality of retinal images were proposed in the literature. However, even if the acceleration of this task is required, especially to allow the creation of mobile systems for capturing retinal images, this task has not yet been addressed in the literature. In this thesis, we target an algorithm that computes image features to allow their classification to bad, medium or good quality. We identified the computation of image features as a repetitive task that necessitates acceleration. We were particularly interested in accelerating the Run-Length Matrix (RLM) algorithm. We proposed a first fully software implementation in the form of an embedded system based on Xilinx's Zynq technology. To accelerate the features computation, we designed a co-processor able to compute the features in parallel, implemented on the programmable logic of the Zynq FPGA. We achieved an acceleration of 30.1× over its software implementation for the features computation part of the RLM algorithm. Retinal blood vessel segmentation is a key task in the pipeline of retinal image processing. Blood vessels and their characteristics are good indicators of retina health. In addition, their segmentation can also help to segment the red lesions, indicators of diabetic retinopathy. Several techniques have been proposed in the literature to segment retinal blood vessels. Hardware architectures have also been proposed to accelerate blood vessel segmentation. The existing architectures lack in terms of performance and programming flexibility, especially for high resolution images. In this thesis, we targeted two techniques, matched filtering and line operators. The matched filtering technique was targeted mainly because of its popularity. For this technique, we proposed two different architectures, a custom hardware architecture implemented on FPGA, and an Application Specific Instruction-set Processor (ASIP) based architecture. The custom hardware architecture area and timing were optimized to achieve higher performances in comparison to existing implementations. Our custom hardware implementation outperforms all existing implementations in terms of throughput. For the ASIP based architecture, we identified two bottlenecks related to data access and computation intensity of the algorithm. We designed two specific instructions added to the processor datapath. The ASIP was made 7.7× more efficient in terms of execution time compared to its basic architecture. The second technique for blood vessel segmentation is the Multi-Scale Line Detector (MSLD) algorithm. The MSLD algorithm is selected because of its performance and its potential to detect small blood vessels. However, the algorithm works at multiple scales which makes it memory intensive. To solve this problem and allow the acceleration of its execution, we proposed a memory-efficient algorithm designed and implemented on FPGA. The proposed architecture reduces drastically the memory requirements of the algorithm by reusing the computations and SW/HW co-design. The two hardware architectures proposed for retinal blood vessel segmentation were made flexible to be able to process low and high resolution images. This was achieved by the development of a specific compiler able to generate low-level HDL descriptions of the algorithm from a set of the algorithm parameters. The compiler enabled us to optimize performance and development time. In this thesis, we introduce two novel architectures which are, to the best of our knowledge, the only ones able to process both low and high resolution images

    FPGA-based High Throughput Regular Expression Pattern Matching for Network Intrusion Detection Systems

    Get PDF
    Network speeds and bandwidths have improved over time. However, the frequency of network attacks and illegal accesses have also increased as the network speeds and bandwidths improved over time. Such attacks are capable of compromising the privacy and confidentiality of network resources belonging to even the most secure networks. Currently, general-purpose processor based software solutions used for detecting network attacks have become inadequate in coping with the current network speeds. Hardware-based platforms are designed to cope with the rising network speeds measured in several gigabits per seconds (Gbps). Such hardware-based platforms are capable of detecting several attacks at once, and a good candidate is the Field-programmable Gate Array (FPGA). The FPGA is a hardware platform that can be used to perform deep packet inspection of network packet contents at high speed. As such, this thesis focused on studying designs that were implemented with Field-programmable Gate Arrays (FPGAs). Furthermore, all the FPGA-based designs studied in this thesis have attempted to sustain a more steady growth in throughput and throughput efficiency. Throughput efficiency is defined as the concurrent throughput of a regular expression matching engine circuit divided by the average number of look up tables (LUTs) utilised by each state of the engine"s automata. The implemented FPGA-based design was built upon the concept of equivalence classification. The concept helped to reduce the overall table size of the inputs needed to drive the various Nondeterministic Finite Automata (NFA) matching engines. Compared with other approaches, the design sustained a throughput of up to 11.48 Gbps, and recorded an overall reduction in the number of pattern matching engines required by up to 75%. Also, the overall memory required by the design was reduced by about 90% when synthesised on the target FPGA platform
    corecore