5 research outputs found

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    Flexible Hardware Architectures for Retinal Image Analysis

    Get PDF
    RÉSUMÉ Des millions de personnes autour du monde sont touchĂ©es par le diabĂšte. Plusieurs complications oculaires telle que la rĂ©tinopathie diabĂ©tique sont causĂ©es par le diabĂšte, ce qui peut conduire Ă  une perte de vision irrĂ©versible ou mĂȘme la cĂ©citĂ© si elles ne sont pas traitĂ©es. Des examens oculaires complets et rĂ©guliers par les ophtalmologues sont nĂ©cessaires pour une dĂ©tection prĂ©coce des maladies et pour permettre leur traitement. Comme solution prĂ©ventive, un protocole de dĂ©pistage impliquant l'utilisation d'images numĂ©riques du fond de l'Ɠil a Ă©tĂ© adoptĂ©. Cela permet aux ophtalmologistes de surveiller les changements sur la rĂ©tine pour dĂ©tecter toute prĂ©sence d'une maladie oculaire. Cette solution a permis d'obtenir des examens rĂ©guliers, mĂȘme pour les populations des rĂ©gions Ă©loignĂ©es et dĂ©favorisĂ©es. Avec la grande quantitĂ© d'images rĂ©tiniennes obtenues, des techniques automatisĂ©es pour les traiter sont devenues indispensables. Les techniques automatisĂ©es de dĂ©tection des maladies des yeux ont Ă©tĂ© largement abordĂ©es par la communautĂ© scientifique. Les techniques dĂ©veloppĂ©es ont atteint un haut niveau de maturitĂ©, ce qui a permis entre autre le dĂ©ploiement de solutions en tĂ©lĂ©mĂ©decine. Dans cette thĂšse, nous abordons le problĂšme du traitement de volumes Ă©levĂ©s d'images rĂ©tiniennes dans un temps raisonnable dans un contexte de dĂ©pistage en tĂ©lĂ©mĂ©decine. Ceci est requis pour permettre l'utilisation pratique des techniques dĂ©veloppĂ©es dans le contexte clinique. Dans cette thĂšse, nous nous concentrons sur deux Ă©tapes du pipeline de traitement des images rĂ©tiniennes. La premiĂšre Ă©tape est l'Ă©valuation de la qualitĂ© de l'image rĂ©tinienne. La deuxiĂšme Ă©tape est la segmentation des vaisseaux sanguins rĂ©tiniens. L’évaluation de la qualitĂ© des images rĂ©tinienne aprĂšs acquisition est une tĂąche primordiale au bon fonctionnement de tout systĂšme de traitement automatique des images de la rĂ©tine. Le rĂŽle de cette Ă©tape est de classifier les images acquises selon leurs qualitĂ©s, et demander une nouvelle acquisition en cas d’image de mauvaise qualitĂ©. Plusieurs algorithmes pour Ă©valuer la qualitĂ© des images rĂ©tiniennes ont Ă©tĂ© proposĂ©s dans la littĂ©rature. Cependant, mĂȘme si l'accĂ©lĂ©ration de cette tĂąche est requise en particulier pour permettre la crĂ©ation de systĂšmes mobiles de capture d'images rĂ©tiniennes, ce sujet n'a pas encore Ă©tĂ© abordĂ© dans la littĂ©rature. Dans cette thĂšse, nous ciblons un algorithme qui calcule les caractĂ©ristiques des images pour permettre leur classification en mauvaise, moyenne ou bonne qualitĂ©. Nous avons identifiĂ© le calcul des caractĂ©ristiques de l'image comme une tĂąche rĂ©pĂ©titive qui nĂ©cessite une accĂ©lĂ©ration. Nous nous sommes intĂ©ressĂ©s plus particuliĂšrement Ă  l’accĂ©lĂ©ration de l’algorithme d’encodage Ă  longueur de sĂ©quence (Run-Length Matrix – RLM). Nous avons proposĂ© une premiĂšre implĂ©mentation complĂštement logicielle mise en Ɠuvre sous forme d’un systĂšme embarquĂ© basĂ© sur la technologie Zynq de Xilinx. Pour accĂ©lĂ©rer le calcul des caractĂ©ristiques, nous avons conçu un co-processeur capable de calculer les caractĂ©ristiques en parallĂšle implĂ©mentĂ© sur la logique programmable du FPGA Zynq. Nous avons obtenu une accĂ©lĂ©ration de 30,1 × pour la tĂąche de calcul des caractĂ©ristiques de l’algorithme RLM par rapport Ă  son implĂ©mentation logicielle sur la plateforme Zynq. La segmentation des vaisseaux sanguins rĂ©tiniens est une tĂąche clĂ© dans le pipeline du traitement des images de la rĂ©tine. Les vaisseaux sanguins et leurs caractĂ©ristiques sont de bons indicateurs de la santĂ© de la rĂ©tine. En outre, leur segmentation peut Ă©galement aider Ă  segmenter les lĂ©sions rouges, indicatrices de la rĂ©tinopathie diabĂ©tique. Plusieurs techniques de segmentation des vaisseaux sanguins rĂ©tiniens ont Ă©tĂ© proposĂ©es dans la littĂ©rature. Des architectures matĂ©rielles ont Ă©galement Ă©tĂ© proposĂ©es pour accĂ©lĂ©rer certaines de ces techniques. Les architectures existantes manquent de performances et de flexibilitĂ© de programmation, notamment pour les images de haute rĂ©solution. Dans cette thĂšse, nous nous sommes intĂ©ressĂ©s Ă  deux techniques de segmentation du rĂ©seau vasculaire rĂ©tinien, la technique du filtrage adaptĂ© et la technique des opĂ©rateurs de ligne. La technique de filtrage adaptĂ© a Ă©tĂ© ciblĂ©e principalement en raison de sa popularitĂ©. Pour cette technique, nous avons proposĂ© deux architectures diffĂ©rentes, une architecture matĂ©rielle personnalisĂ©e mise en Ɠuvre sur FPGA et une architecture basĂ©e sur un ASIP. L'architecture matĂ©rielle personnalisĂ©e a Ă©tĂ© optimisĂ©e en termes de surface et de dĂ©bit de traitement pour obtenir des performances supĂ©rieures par rapport aux implĂ©mentations existantes dans la littĂ©rature. Cette implĂ©mentation est plus efficace que toutes les implĂ©mentations existantes en termes de dĂ©bit. Pour l'architecture basĂ©e sur un processeur Ă  jeu d’instructions spĂ©cialisĂ© (Application-Specific Instruction-set Processor – ASIP), nous avons identifiĂ© deux goulets d'Ă©tranglement liĂ©s Ă  l'accĂšs aux donnĂ©es et Ă  la complexitĂ© des calculs de l'algorithme. Nous avons conçu des instructions spĂ©cifiques ajoutĂ©es au chemin de donnĂ©es du processeur. L'ASIP a Ă©tĂ© rendu 7.7 × plus rapide par rapport Ă  son architecture de base. La deuxiĂšme technique pour la segmentation des vaisseaux sanguins est l'algorithme dĂ©tecteur de ligne multi-Ă©chelle (Multi-Scale Ligne Detector – MSLD). L'algorithme MSLD est choisi en raison de ses performances et de son potentiel Ă  dĂ©tecter les petits vaisseaux sanguins. Cependant, l'algorithme fonctionne en multi-Ă©chelle, ce qui rend l’algorithme gourmand en mĂ©moire. Pour rĂ©soudre ce problĂšme et permettre l'accĂ©lĂ©ration de son exĂ©cution, nous avons proposĂ© un algorithme efficace en terme de mĂ©moire, conçu et implĂ©mentĂ© sur FPGA. L'architecture proposĂ©e a rĂ©duit de façon drastique les exigences de l’algorithme en terme de mĂ©moire en rĂ©utilisant les calculs et la co-conception logicielle/matĂ©rielle. Les deux architectures matĂ©rielles proposĂ©es pour la segmentation du rĂ©seau vasculaire rĂ©tinien ont Ă©tĂ© rendues flexibles pour pouvoir traiter des images de basse et de haute rĂ©solution. Ceci a Ă©tĂ© rĂ©alisĂ© par le dĂ©veloppement d'un compilateur spĂ©cifique capable de gĂ©nĂ©rer une description HDL de bas niveau de l'algorithme Ă  partir d'un ensemble de paramĂštres. Le compilateur nous a permis d’optimiser les performances et le temps de dĂ©veloppement. Dans cette thĂšse, nous avons introduit deux architectures qui sont, au meilleur de nos connaissances, les seules capables de traiter des images Ă  la fois de basse et de haute rĂ©solution.----------ABSTRACT Millions of people all around the world are affected by diabetes. Several ocular complications such as diabetic retinopathy are caused by diabetes, which can lead to irreversible vision loss or even blindness if not treated. Regular comprehensive eye exams by eye doctors are required to detect the diseases at earlier stages and permit their treatment. As a preventing solution, a screening protocol involving the use of digital fundus images was adopted. This allows eye doctors to monitor changes in the retina to detect any presence of eye disease. This solution made regular examinations widely available, even to populations in remote and underserved areas. With the resulting large amount of retinal images, automated techniques to process them are required. Automated eye detection techniques are largely addressed by the research community, and now they reached a high level of maturity, which allows the deployment of telemedicine solutions. In this thesis, we are addressing the problem of processing a high volume of retinal images in a reasonable time. This is mandatory to allow the practical use of the developed techniques in a clinical context. In this thesis, we focus on two steps of the retinal image pipeline. The first step is the retinal image quality assessment. The second step is the retinal blood vessel segmentation. The evaluation of the quality of the retinal images after acquisition is a primary task for the proper functioning of any automated retinal image processing system. The role of this step is to classify the acquired images according to their quality, which will allow an automated system to request a new acquisition in case of poor quality image. Several algorithms to evaluate the quality of retinal images were proposed in the literature. However, even if the acceleration of this task is required, especially to allow the creation of mobile systems for capturing retinal images, this task has not yet been addressed in the literature. In this thesis, we target an algorithm that computes image features to allow their classification to bad, medium or good quality. We identified the computation of image features as a repetitive task that necessitates acceleration. We were particularly interested in accelerating the Run-Length Matrix (RLM) algorithm. We proposed a first fully software implementation in the form of an embedded system based on Xilinx's Zynq technology. To accelerate the features computation, we designed a co-processor able to compute the features in parallel, implemented on the programmable logic of the Zynq FPGA. We achieved an acceleration of 30.1× over its software implementation for the features computation part of the RLM algorithm. Retinal blood vessel segmentation is a key task in the pipeline of retinal image processing. Blood vessels and their characteristics are good indicators of retina health. In addition, their segmentation can also help to segment the red lesions, indicators of diabetic retinopathy. Several techniques have been proposed in the literature to segment retinal blood vessels. Hardware architectures have also been proposed to accelerate blood vessel segmentation. The existing architectures lack in terms of performance and programming flexibility, especially for high resolution images. In this thesis, we targeted two techniques, matched filtering and line operators. The matched filtering technique was targeted mainly because of its popularity. For this technique, we proposed two different architectures, a custom hardware architecture implemented on FPGA, and an Application Specific Instruction-set Processor (ASIP) based architecture. The custom hardware architecture area and timing were optimized to achieve higher performances in comparison to existing implementations. Our custom hardware implementation outperforms all existing implementations in terms of throughput. For the ASIP based architecture, we identified two bottlenecks related to data access and computation intensity of the algorithm. We designed two specific instructions added to the processor datapath. The ASIP was made 7.7× more efficient in terms of execution time compared to its basic architecture. The second technique for blood vessel segmentation is the Multi-Scale Line Detector (MSLD) algorithm. The MSLD algorithm is selected because of its performance and its potential to detect small blood vessels. However, the algorithm works at multiple scales which makes it memory intensive. To solve this problem and allow the acceleration of its execution, we proposed a memory-efficient algorithm designed and implemented on FPGA. The proposed architecture reduces drastically the memory requirements of the algorithm by reusing the computations and SW/HW co-design. The two hardware architectures proposed for retinal blood vessel segmentation were made flexible to be able to process low and high resolution images. This was achieved by the development of a specific compiler able to generate low-level HDL descriptions of the algorithm from a set of the algorithm parameters. The compiler enabled us to optimize performance and development time. In this thesis, we introduce two novel architectures which are, to the best of our knowledge, the only ones able to process both low and high resolution images

    Compilation Techniques for High-Performance Embedded Systems with Multiple Processors

    Get PDF
    Institute for Computing Systems ArchitectureDespite the progress made in developing more advanced compilers for embedded systems, programming of embedded high-performance computing systems based on Digital Signal Processors (DSPs) is still a highly skilled manual task. This is true for single-processor systems, and even more for embedded systems based on multiple DSPs. Compilers often fail to optimise existing DSP codes written in C due to the employed programming style. Parallelisation is hampered by the complex multiple address space memory architecture, which can be found in most commercial multi-DSP configurations. This thesis develops an integrated optimisation and parallelisation strategy that can deal with low-level C codes and produces optimised parallel code for a homogeneous multi-DSP architecture with distributed physical memory and multiple logical address spaces. In a first step, low-level programming idioms are identified and recovered. This enables the application of high-level code and data transformations well-known in the field of scientific computing. Iterative feedback-driven search for “good” transformation sequences is being investigated. A novel approach to parallelisation based on a unified data and loop transformation framework is presented and evaluated. Performance optimisation is achieved through exploitation of data locality on the one hand, and utilisation of DSP-specific architectural features such as Direct Memory Access (DMA) transfers on the other hand. The proposed methodology is evaluated against two benchmark suites (DSPstone & UTDSP) and four different high-performance DSPs, one of which is part of a commercial four processor multi-DSP board also used for evaluation. Experiments confirm the effectiveness of the program recovery techniques as enablers of high-level transformations and automatic parallelisation. Source-to-source transformations of DSP codes yield an average speedup of 2.21 across four different DSP architectures. The parallelisation scheme is – in conjunction with a set of locality optimisations – able to produce linear and even super-linear speedups on a number of relevant DSP kernels and applications
    corecore