59 research outputs found

    Embedded Vision Systems: A Review of the Literature

    Get PDF
    Over the past two decades, the use of low power Field Programmable Gate Arrays (FPGA) for the acceleration of various vision systems mainly on embedded devices have become widespread. The reconfigurable and parallel nature of the FPGA opens up new opportunities to speed-up computationally intensive vision and neural algorithms on embedded and portable devices. This paper presents a comprehensive review of embedded vision algorithms and applications over the past decade. The review will discuss vision based systems and approaches, and how they have been implemented on embedded devices. Topics covered include image acquisition, preprocessing, object detection and tracking, recognition as well as high-level classification. This is followed by an outline of the advantages and disadvantages of the various embedded implementations. Finally, an overview of the challenges in the field and future research trends are presented. This review is expected to serve as a tutorial and reference source for embedded computer vision systems

    Research And Application Of Parallel Computing Algorithms For Statistical Phylogenetic Inference

    Get PDF
    Estimating the evolutionary history of organisms, phylogenetic inference, is a critical step in many analyses involving biological sequence data such as DNA. The likelihood calculations at the heart of the most effective methods for statistical phylogenetic analyses are extremely computationally intensive, and hence these analyses become a bottleneck in many studies. Recent progress in computer hardware, specifically the increase in pervasiveness of highly parallel, many-core processors has created opportunities for new approaches to computationally intensive methods, such as those in phylogenetic inference. We have developed an open source library, BEAGLE, which uses parallel computing methods to greatly accelerate statistical phylogenetic inference, for both maximum likelihood and Bayesian approaches. BEAGLE defines a uniform application programming interface and includes a collection of efficient implementations that use NVIDIA CUDA, OpenCL, and C++ threading frameworks for evaluating likelihoods under a wide variety of evolutionary models, on GPUs as well as on multi-core CPUs. BEAGLE employs a number of different parallelization techniques for phylogenetic inference, at different granularity levels and for distinct processor architectures. On CUDA and OpenCL devices, the library enables concurrent computation of site likelihoods, data subsets, and independent subtrees. The general design features of the library also provide a model for software development using parallel computing frameworks that is applicable to other domains. BEAGLE has been integrated with some of the leading programs in the field, such as MrBayes and BEAST, and is used in a diverse range of evolutionary studies, including those of disease causing viruses. The library can provide significant performance gains, with the exact increase in performance depending on the specific properties of the data set, evolutionary model, and hardware. In general, nucleotide analyses are accelerated on the order of 10-fold and codon analyses on the order of 100-fold

    Automatic Loop Tuning and Memory Management for Stencil Computations

    Get PDF
    The Texas Instruments C66x Digital Signal Processor (DSP) is an embedded processor technology that is targeted at real time signal processing. It is also developed with a high potential to become the new generation of coprocessor technology for high performance embedded computing. Of particular interest is its performance for stencil computations, such as those found in signal processing and computer vision tasks. A stencil is a loop in which the output value is updated at each position of an array by taking a weighted function of its neighbors. Efficiently mapping stencil-based kernels to the C66x device presents two challenges. The first one is how to efficiently optimize loops in order to facilitate the usage of Single Instruction Multiple Data (SIMD) instructions. On this architecture, like most others, SIMD instructions are not directly generated by the compiler. The second problem is how to manage on-chip memory in a way that minimizes off-chip memory access. Although this could theoretically be achieved by using a highly associative cache, the high rate of data reuse in stencil loops causes a high conflict miss rate. One way to solve this problem is to configure the on-chip memory as a program controlled scratchpad. It allows user to buffer a 2D block of data and minimizes the off-chip data access. For this dissertation, we have accomplished two goals: (1) Develop a methodology for optimization of arbitrary 2D stencils that fully utilize SIMD instructions through microachitecture-aware loop unrolling. (2) Deliver an easy-to-use scratchpad buffer management system and use it to improve the memory efficiency for 2D stencils. We show in the results and analysis section that our stencil compiler is able to achieve up to 2x speed up compared with the code generated by the industrial standard compiler developed by Texas Instruments, and our memory management system is able to achieve up to 10x speed up compared with cache

    Non-Visual Representation of Complex Documents for Use in Digital Talking Books

    Get PDF
    Essential written information such as text books, bills, and catalogues needs to be accessible by everyone. However, access is not always available to vision-impaired people. As they require electronic documents to be available in specific formats. In order to address the accessibility issues of electronic documents, this research aims to design an affordable, portable, standalone and simple to use complete reading system that will convert and describe complex components in electronic documents to print disabled users

    Phobos: The design and implementation of embedded software for a low cost radar warning receiver

    Get PDF
    This portfolio thesis describes work undertaken by the author under the Engineering Doctorate program of the Institute for System Level Integration. It was carried out in conjunction with the sponsor company Teledyne Defence Limited. A radar warning receiver is a device used to detect and identify the emissions of radars. They were originally developed during the Second World War and are found today on a variety of military platforms as part of the platform’s defensive systems. Teledyne Defence has designed and built components and electronic subsystems for the defence industry since the 1970s. This thesis documents part of the work carried out to create Phobos, Teledyne Defence’s first complete radar warning receiver. Phobos was designed to be the first low cost radar warning receiver. This was made possible by the reuse of existing Teledyne Defence products, commercial off the shelf hardware and advanced UK government algorithms. The challenges of this integration are described and discussed, with detail given of the software architecture and the development of the embedded application. Performance of the embedded system as a whole is described and qualified within the context of a low cost system

    Skaalautuvat laskentamenetelmät suuren kapasiteetin sekvensointidatan analytiikkaan populaatiogenomiikassa

    Get PDF
    High-throughput sequencing (HTS) technologies have enabled rapid DNA sequencing of whole-genomes collected from various organisms and environments, including human tissues, plants, soil, water, and air. As a result, sequencing data volumes have grown by several orders of magnitude, and the number of assembled whole-genomes is increasing rapidly as well. This whole-genome sequencing (WGS) data has revealed the genetic variation in humans and other species, and advanced various fields from human and microbial genomics to drug design and personalized medicine. The amount of sequencing data has almost doubled every six months, creating new possibilities but also big data challenges in genomics. Diverse methods used in modern computational biology require a vast amount of computational power, and advances in HTS technology are even widening the gap between the analysis input data and the analysis outcome. Currently, many of the existing genomic analysis tools, algorithms, and pipelines are not fully exploiting the power of distributed and high-performance computing, which in turn limits the analysis throughput and restrains the deployment of the applications to clinical practice in the long run. Thus, the relevance of harnessing distributed and cloud computing in bioinformatics is more significant than ever before. Besides, efficient data compression and storage methods for genomic data processing and retrieval integrated with conventional bioinformatics tools are essential. These vast datasets have to be stored and structured in formats that can be managed, processed, searched, and analyzed efficiently in distributed systems. Genomic data contain repetitive sequences, which is one key property in developing efficient compression algorithms to alleviate the data storage burden. Moreover, indexing compressed sequences appropriately for bioinformatics tools, such as read aligners, offers direct sequence search and alignment capabilities with compressed indexes. Relative Lempel-Ziv (RLZ) has been found to be an efficient compression method for repetitive genomes that complies with the data-parallel computing approach. RLZ has recently been used to build hybrid-indexes compatible with read aligners, and we focus on extending it with distributed computing. Data structures found in genomic data formats have properties suitable for parallelizing routine bioinformatics methods, e.g., sequence matching, read alignment, genome assembly, genotype imputation, and variant calling. Compressed indexing fused with the routine bioinformatics methods and data-parallel computing seems a promising approach to building population-scale genome analysis pipelines. Various data decomposition and transformation strategies are studied for optimizing data-parallel computing performance when such routine bioinformatics methods are executed in a complex pipeline. These novel distributed methods are studied in this dissertation and demonstrated in a generalized scalable bioinformatics analysis pipeline design. The dissertation starts from the main concepts of genomics and DNA sequencing technologies and builds routine bioinformatics methods on the principles of distributed and parallel computing. This dissertation advances towards designing fully distributed and scalable bioinformatics pipelines focusing on population genomic problems where the input data sets are vast and the analysis results are hard to achieve with conventional computing. Finally, the methods studied are applied in scalable population genomics applications using real WGS data and experimented with in a high performance computing cluster. The experiments include mining virus sequences from human metagenomes, imputing genotypes from large-scale human populations, sequence alignment with compressed pan-genomic indexes, and assembling reference genomes for pan-genomic variant calling.Suuren kapasiteetin sekvensointimenetelmät (High-Throughput Sequencing, HTS) ovat mahdollistaneet kokonaisten genomien nopean ja huokean sekvensoinnin eri organismeista ja ympäristöistä, mukaan lukien kudos-, maaperä-, vesistö- ja ilmastonäytteet. Tämän seurauksena sekvensointidatan ja koostettujen kokogenomien määrät ovat kasvaneet nopeasti. Kokogenomin sekvensointi on lisännyt ihmisen ja muiden lajien geneettisen perimän tietämystä ja edistänyt eri tieteenaloja ympäristötieteistä lääkesuunnitteluun ja yksilölliseen lääketieteeseen. Sekvensointidatan määrä on lähes kaksinkertaistunut puolivuosittain, mikä on luonut uusia mahdollisuuksia läpimurtoihin, mutta myös suuria datankäsittelyn haasteita. Nykyaikaisessa laskennallisessa biologiassa käytettävät monimutkaiset analyysimenetelmät vaativat yhä enemmän laskentatehoa HTS-datan kasvaessa, ja siksi HTS-menetelmien edistyminen kasvattaa kuilua raakadatasta lopullisiin analyysituloksiin. Useat tällä hetkellä käytetyistä genomianalyysityökaluista, algoritmeista ja ohjelmistoista eivät hyödynnä hajautetun laskennan tehoa kokonaisvaltaisesti, mikä puolestaan ​​hidastaa uusimpien analyysitulosten saamista ja rajoittaa tieteellisten ohjelmistojen käyttöönottoa kliinisessä lääketieteessä pitkällä aikavälillä. Näin ollen hajautetun ja pilvilaskennan hyödyntämisen merkitys bioinformatiikassa on tärkeämpää kuin koskaan ennen. Genomitiedon suoraa hakua ja käsittelyä tukevat pakkaus- ja tallennusmenetelmät mahdollistavat nopean ja tilatehokkaan genomianalytiikan. Uusia hajautettuihin järjestelmiin soveltuvia tietorakenteita tarvitaan, jotta näitä suuria datamääriä voidaan hallita, käsitellä, hakea ja analysoida tehokkaasti. Genomidata sisältää runsaasti toistuvia sekvenssejä, mikä on yksi keskeinen ominaisuus kehitettäessä tehokkaita pakkausalgoritmeja tiedontallennustaakkaa ja analysointia keventämään. Lisäksi pakattujen sekvenssien indeksointi yhdistettynä sekvenssilinjausmenetelmiin mahdollistaa sekvenssien satunnaishaun ja suoran linjauksen pakattuihin sekvensseihin. Relative Lempel-Ziv (RLZ) pakkausmenetelmä on todettu tehokkaaksi toistuville genomisekvensseille rinnakkaislaskentaa hyödyntäen. RLZ-menetelmää on viime aikoina sovellettu sekvenssilinjaukseen yhteensopiviin hybridi-indekseihin, joita tässä työssä on nopeutettu hajautetulla laskennalla. Genomiikan dataformaateista löytyvillä tietorakenteilla on ominaisuuksia, jotka soveltuvat hajautettuun sekvenssihakuun, sekvenssilinjaukseen, genomien koostamiseen, genotyyppien imputointiin ja varianttien havaitsemiseen. Pakattu indeksointi sovellettuna hajautetulla laskennalla tehostettuihin menetelmiin vaikuttaa lupaavalta lähestymistavalta populaatiogenomiikan analyysiohjelmistojen mukauttamiseksi suuriin datamääriin. Erilaisia ​​tiedon osittamis- ja muunnosstrategioita hyödynnetään suorituskyvyn tehostamiseen monivaiheisessa hajautetussa genomidatan prosessoinnissa. Näitä uusia skaalautuvia hajautettuja laskentamenetelmiä tutkitaan tässä väitöskirjassa ja demonstroidaan yleisluontoisella bioinformatiikan analyysiohjelmiston arkkitehtuurilla. Tässä työssä johdatellaan genomiikan ja DNA-sekvensointitekniikoiden peruskäsitteisiin ja esitellään rutiininomaisia ​​bioinformatiikan menetelmiä perustuen hajautetun ja rinnakkaislaskennan periaatteille. Väitöskirjassa edetään kohti täysin hajautettujen ja skaalautuvien bioinformatiikan ohjelmistojen suunnittelua keskittyen populaatiogenomiikan ongelmiin, joissa syötedatan määrät ovat suuria ja analyysitulosten saavuttaminen on hidasta tai jopa mahdotonta tavanomaisella laskennalla. Lopuksi tutkittuja menetelmiä sovelletaan tässä työssä kehitettyihin skaalautuviin populaatiogenomiikan sovelluksiin, joita koestetaan kokogenomidatalla supertietokoneen laskentaklusterissa. Kokeet sisältävät virussekvenssien louhintaa ihmisten metagenominäytteistä, genotyyppien täydentämistä (imputointia) suurista ihmispopulaatioista ja pan-genomisen indeksin pakkaamista sekvenssilinjauksen nopeuttamista varten. Lisäksi pakattua pan-genomia kokeillaan referenssigenomin koostamiseen populaatioon perustuvien varianttien havaitsemista varten

    Formal Verification and Fault Mitigation for Small Avionics Platforms using Programmable Logic

    Get PDF
    As commercial and personal unmanned aircraft gain popularity and begin to account for more traffic in the sky, the reliability and integrity of their flight controllers becomes increasingly important. As these aircraft get larger and start operating over longer distances and at higher altitude they will start to interact with other controlled air traffic and the risk of a failure in the control system becomes much more severe. As any engineer who has investigated any space bound technology will know, digital systems do not always behave exactly as they are supposed to. This can be attributed to the effects of high energy particles in the atmosphere that can deposit energy randomly throughout a digital circuit. These single event effects are capable of producing transient logic levels and altering the state of registers in a circuit, corrupting data and possibly leading to a failure of the flight controller. These effects become more common as altitude increases, as well as with the increase of registers in a digital system. High integrity flight controllers also require more development effort to show that they meet the required standard. Formal methods can be used to verify digital systems and prove that they meet certain specifications. For traditional software systems that perform many tasks on shared computational resources, formal methods can be quite difficult if not impossible to implement. The use of discrete logic controllers in the form of FPGAs greatly simplifies multitasking by removing the need for shared resources. This simplicity allows formal methods to be applied during the development of the flight control algorithms & device drivers. In this thesis we propose and demonstrate a flight controller implemented entirely within an FPGA to investigate the differences and difficulties when compared with traditional CPU software implementations. We go further to provide examples of formal verifications of specific parts of the flight control firmware to demonstrate the ease with which this can be achieved. We also make efforts to protect the flight controller from the effects of radiation at higher altitudes using both passive hardware design and active register transfer level algorithms

    Advanced Applications of Rapid Prototyping Technology in Modern Engineering

    Get PDF
    Rapid prototyping (RP) technology has been widely known and appreciated due to its flexible and customized manufacturing capabilities. The widely studied RP techniques include stereolithography apparatus (SLA), selective laser sintering (SLS), three-dimensional printing (3DP), fused deposition modeling (FDM), 3D plotting, solid ground curing (SGC), multiphase jet solidification (MJS), laminated object manufacturing (LOM). Different techniques are associated with different materials and/or processing principles and thus are devoted to specific applications. RP technology has no longer been only for prototype building rather has been extended for real industrial manufacturing solutions. Today, the RP technology has contributed to almost all engineering areas that include mechanical, materials, industrial, aerospace, electrical and most recently biomedical engineering. This book aims to present the advanced development of RP technologies in various engineering areas as the solutions to the real world engineering problems

    Operating System Kernels on Multi-core Architectures

    Get PDF
    Operating System (OS) kernels have been under research and development for decades, mainly assuming single processor and distributed hardware systems. With the recent rise of multi-core chips that may incorporate a network on chip (NoC), new challenges have appeared that were not considered before. Given that a complete multi-core system that works on a single system on chip (SoC) is now the normal case, different cores on a single SoC may share other physical resources and data. This new sharing scheme on a SoC affects crucial aspects of an overall system like correctness, performance, predictability, scalability and security. Both hardware and OSs to flexibly cooperate in order to provide solutions for such challenges. SoC mimics the internet somehow now, with different cores acting as computer nodes, and the network medium is given in an advanced digital fabrics like buses or NoCs, that are a current research area. However, OSs are still assuming some (hardware) features like single physical memory and memory sharing for inter-process communication, page-based protection, cache operations, even when evolving from uniprocessor to multi-core processors. Such features not only may degrade performance and other system aspects, but also some of them make no sense for a multi-core SoC, and introduce some barriers and limitations. While new OS research is considering different kernel designs to cope up with multi-core systems, they are still limited by the current commercial hardware architectures. The objective of this thesis is to assess different kernel designs and implementations on multi-core hardware architectures. Part of the contributions of the thesis is porting RTEMS (RTOS) and seL4 microkernel to Epiphany and RISC-V hardware architectures respectively, trading-off the design and implementation decisions. This hands-on experience gave a better understanding of the real-world challenges regarding kernel designs and implementations
    corecore