85 research outputs found

    Study on Low-Power Image Processing for Gastrointestinal Endoscopy

    Get PDF

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    Flexible Hardware Architectures for Retinal Image Analysis

    Get PDF
    RÉSUMÉ Des millions de personnes autour du monde sont touchées par le diabète. Plusieurs complications oculaires telle que la rétinopathie diabétique sont causées par le diabète, ce qui peut conduire à une perte de vision irréversible ou même la cécité si elles ne sont pas traitées. Des examens oculaires complets et réguliers par les ophtalmologues sont nécessaires pour une détection précoce des maladies et pour permettre leur traitement. Comme solution préventive, un protocole de dépistage impliquant l'utilisation d'images numériques du fond de l'œil a été adopté. Cela permet aux ophtalmologistes de surveiller les changements sur la rétine pour détecter toute présence d'une maladie oculaire. Cette solution a permis d'obtenir des examens réguliers, même pour les populations des régions éloignées et défavorisées. Avec la grande quantité d'images rétiniennes obtenues, des techniques automatisées pour les traiter sont devenues indispensables. Les techniques automatisées de détection des maladies des yeux ont été largement abordées par la communauté scientifique. Les techniques développées ont atteint un haut niveau de maturité, ce qui a permis entre autre le déploiement de solutions en télémédecine. Dans cette thèse, nous abordons le problème du traitement de volumes élevés d'images rétiniennes dans un temps raisonnable dans un contexte de dépistage en télémédecine. Ceci est requis pour permettre l'utilisation pratique des techniques développées dans le contexte clinique. Dans cette thèse, nous nous concentrons sur deux étapes du pipeline de traitement des images rétiniennes. La première étape est l'évaluation de la qualité de l'image rétinienne. La deuxième étape est la segmentation des vaisseaux sanguins rétiniens. L’évaluation de la qualité des images rétinienne après acquisition est une tâche primordiale au bon fonctionnement de tout système de traitement automatique des images de la rétine. Le rôle de cette étape est de classifier les images acquises selon leurs qualités, et demander une nouvelle acquisition en cas d’image de mauvaise qualité. Plusieurs algorithmes pour évaluer la qualité des images rétiniennes ont été proposés dans la littérature. Cependant, même si l'accélération de cette tâche est requise en particulier pour permettre la création de systèmes mobiles de capture d'images rétiniennes, ce sujet n'a pas encore été abordé dans la littérature. Dans cette thèse, nous ciblons un algorithme qui calcule les caractéristiques des images pour permettre leur classification en mauvaise, moyenne ou bonne qualité. Nous avons identifié le calcul des caractéristiques de l'image comme une tâche répétitive qui nécessite une accélération. Nous nous sommes intéressés plus particulièrement à l’accélération de l’algorithme d’encodage à longueur de séquence (Run-Length Matrix – RLM). Nous avons proposé une première implémentation complètement logicielle mise en œuvre sous forme d’un système embarqué basé sur la technologie Zynq de Xilinx. Pour accélérer le calcul des caractéristiques, nous avons conçu un co-processeur capable de calculer les caractéristiques en parallèle implémenté sur la logique programmable du FPGA Zynq. Nous avons obtenu une accélération de 30,1 × pour la tâche de calcul des caractéristiques de l’algorithme RLM par rapport à son implémentation logicielle sur la plateforme Zynq. La segmentation des vaisseaux sanguins rétiniens est une tâche clé dans le pipeline du traitement des images de la rétine. Les vaisseaux sanguins et leurs caractéristiques sont de bons indicateurs de la santé de la rétine. En outre, leur segmentation peut également aider à segmenter les lésions rouges, indicatrices de la rétinopathie diabétique. Plusieurs techniques de segmentation des vaisseaux sanguins rétiniens ont été proposées dans la littérature. Des architectures matérielles ont également été proposées pour accélérer certaines de ces techniques. Les architectures existantes manquent de performances et de flexibilité de programmation, notamment pour les images de haute résolution. Dans cette thèse, nous nous sommes intéressés à deux techniques de segmentation du réseau vasculaire rétinien, la technique du filtrage adapté et la technique des opérateurs de ligne. La technique de filtrage adapté a été ciblée principalement en raison de sa popularité. Pour cette technique, nous avons proposé deux architectures différentes, une architecture matérielle personnalisée mise en œuvre sur FPGA et une architecture basée sur un ASIP. L'architecture matérielle personnalisée a été optimisée en termes de surface et de débit de traitement pour obtenir des performances supérieures par rapport aux implémentations existantes dans la littérature. Cette implémentation est plus efficace que toutes les implémentations existantes en termes de débit. Pour l'architecture basée sur un processeur à jeu d’instructions spécialisé (Application-Specific Instruction-set Processor – ASIP), nous avons identifié deux goulets d'étranglement liés à l'accès aux données et à la complexité des calculs de l'algorithme. Nous avons conçu des instructions spécifiques ajoutées au chemin de données du processeur. L'ASIP a été rendu 7.7 × plus rapide par rapport à son architecture de base. La deuxième technique pour la segmentation des vaisseaux sanguins est l'algorithme détecteur de ligne multi-échelle (Multi-Scale Ligne Detector – MSLD). L'algorithme MSLD est choisi en raison de ses performances et de son potentiel à détecter les petits vaisseaux sanguins. Cependant, l'algorithme fonctionne en multi-échelle, ce qui rend l’algorithme gourmand en mémoire. Pour résoudre ce problème et permettre l'accélération de son exécution, nous avons proposé un algorithme efficace en terme de mémoire, conçu et implémenté sur FPGA. L'architecture proposée a réduit de façon drastique les exigences de l’algorithme en terme de mémoire en réutilisant les calculs et la co-conception logicielle/matérielle. Les deux architectures matérielles proposées pour la segmentation du réseau vasculaire rétinien ont été rendues flexibles pour pouvoir traiter des images de basse et de haute résolution. Ceci a été réalisé par le développement d'un compilateur spécifique capable de générer une description HDL de bas niveau de l'algorithme à partir d'un ensemble de paramètres. Le compilateur nous a permis d’optimiser les performances et le temps de développement. Dans cette thèse, nous avons introduit deux architectures qui sont, au meilleur de nos connaissances, les seules capables de traiter des images à la fois de basse et de haute résolution.----------ABSTRACT Millions of people all around the world are affected by diabetes. Several ocular complications such as diabetic retinopathy are caused by diabetes, which can lead to irreversible vision loss or even blindness if not treated. Regular comprehensive eye exams by eye doctors are required to detect the diseases at earlier stages and permit their treatment. As a preventing solution, a screening protocol involving the use of digital fundus images was adopted. This allows eye doctors to monitor changes in the retina to detect any presence of eye disease. This solution made regular examinations widely available, even to populations in remote and underserved areas. With the resulting large amount of retinal images, automated techniques to process them are required. Automated eye detection techniques are largely addressed by the research community, and now they reached a high level of maturity, which allows the deployment of telemedicine solutions. In this thesis, we are addressing the problem of processing a high volume of retinal images in a reasonable time. This is mandatory to allow the practical use of the developed techniques in a clinical context. In this thesis, we focus on two steps of the retinal image pipeline. The first step is the retinal image quality assessment. The second step is the retinal blood vessel segmentation. The evaluation of the quality of the retinal images after acquisition is a primary task for the proper functioning of any automated retinal image processing system. The role of this step is to classify the acquired images according to their quality, which will allow an automated system to request a new acquisition in case of poor quality image. Several algorithms to evaluate the quality of retinal images were proposed in the literature. However, even if the acceleration of this task is required, especially to allow the creation of mobile systems for capturing retinal images, this task has not yet been addressed in the literature. In this thesis, we target an algorithm that computes image features to allow their classification to bad, medium or good quality. We identified the computation of image features as a repetitive task that necessitates acceleration. We were particularly interested in accelerating the Run-Length Matrix (RLM) algorithm. We proposed a first fully software implementation in the form of an embedded system based on Xilinx's Zynq technology. To accelerate the features computation, we designed a co-processor able to compute the features in parallel, implemented on the programmable logic of the Zynq FPGA. We achieved an acceleration of 30.1× over its software implementation for the features computation part of the RLM algorithm. Retinal blood vessel segmentation is a key task in the pipeline of retinal image processing. Blood vessels and their characteristics are good indicators of retina health. In addition, their segmentation can also help to segment the red lesions, indicators of diabetic retinopathy. Several techniques have been proposed in the literature to segment retinal blood vessels. Hardware architectures have also been proposed to accelerate blood vessel segmentation. The existing architectures lack in terms of performance and programming flexibility, especially for high resolution images. In this thesis, we targeted two techniques, matched filtering and line operators. The matched filtering technique was targeted mainly because of its popularity. For this technique, we proposed two different architectures, a custom hardware architecture implemented on FPGA, and an Application Specific Instruction-set Processor (ASIP) based architecture. The custom hardware architecture area and timing were optimized to achieve higher performances in comparison to existing implementations. Our custom hardware implementation outperforms all existing implementations in terms of throughput. For the ASIP based architecture, we identified two bottlenecks related to data access and computation intensity of the algorithm. We designed two specific instructions added to the processor datapath. The ASIP was made 7.7× more efficient in terms of execution time compared to its basic architecture. The second technique for blood vessel segmentation is the Multi-Scale Line Detector (MSLD) algorithm. The MSLD algorithm is selected because of its performance and its potential to detect small blood vessels. However, the algorithm works at multiple scales which makes it memory intensive. To solve this problem and allow the acceleration of its execution, we proposed a memory-efficient algorithm designed and implemented on FPGA. The proposed architecture reduces drastically the memory requirements of the algorithm by reusing the computations and SW/HW co-design. The two hardware architectures proposed for retinal blood vessel segmentation were made flexible to be able to process low and high resolution images. This was achieved by the development of a specific compiler able to generate low-level HDL descriptions of the algorithm from a set of the algorithm parameters. The compiler enabled us to optimize performance and development time. In this thesis, we introduce two novel architectures which are, to the best of our knowledge, the only ones able to process both low and high resolution images

    Dynamically reconfigurable architecture for embedded computer vision systems

    Get PDF
    The objective of this research work is to design, develop and implement a new architecture which integrates on the same chip all the processing levels of a complete Computer Vision system, so that the execution is efficient without compromising the power consumption while keeping a reduced cost. For this purpose, an analysis and classification of different mathematical operations and algorithms commonly used in Computer Vision are carried out, as well as a in-depth review of the image processing capabilities of current-generation hardware devices. This permits to determine the requirements and the key aspects for an efficient architecture. A representative set of algorithms is employed as benchmark to evaluate the proposed architecture, which is implemented on an FPGA-based system-on-chip. Finally, the prototype is compared to other related approaches in order to determine its advantages and weaknesses

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    Dynamically reconfigurable management of energy, performance, and accuracy applied to digital signal, image, and video Processing Applications

    Get PDF
    There is strong interest in the development of dynamically reconfigurable systems that can meet real-time constraints in energy/power-performance-accuracy (EPA/PPA). In this dissertation, I introduce a framework for implementing dynamically reconfigurable digital signal, image, and video processing systems. The basic idea is to first generate a collection of Pareto-optimal realizations in the EPA/PPA space. Dynamic EPA/PPA management is then achieved by selecting the Pareto-optimal implementations that can meet the real-time constraints. The systems are then demonstrated using Dynamic Partial Reconfiguration (DPR) and dynamic frequency control on FPGAs. The framework is demonstrated on: i) a dynamic pixel processor, ii) a dynamically reconfigurable 1-D digital filtering architecture, and iii) a dynamically reconfigurable 2-D separable digital filtering system. Efficient implementations of the pixel processor are based on the use of look-up tables and local-multiplexes to minimize FPGA resources. For the pixel-processor, different realizations are generated based on the number of input bits, the number of cores, the number of output bits, and the frequency of operation. For each parameters combination, there is a different pixel-processor realization. Pareto-optimal realizations are selected based on measurements of energy per frame, PSNR accuracy, and performance in terms of frames per second. Dynamic EPA/PPA management is demonstrated for a sequential list of real-time constraints by selecting optimal realizations and implementing using DPR and dynamic frequency control. Efficient FPGA implementations for the 1-D and 2-D FIR filters are based on the use a distributed arithmetic technique. Different realizations are generated by varying the number of coefficients, coefficient bitwidth, and output bitwidth. Pareto-optimal realizations are selected in the EPA space. Dynamic EPA management is demonstrated on the application of real-time EPA constraints on a digital video. The results suggest that the general framework can be applied to a variety of digital signal, image, and video processing systems. It is based on the use of offline-processing that is used to determine the Pareto-optimal realizations. Real-time constraints are met by selecting Pareto-optimal realizations pre-loaded in memory that are then implemented efficiently using DPR and/or dynamic frequency control
    corecore