163 research outputs found

    FPGAを用いた最節約法による進化系統樹構築アルゴリズムの高速化

    Get PDF
    筑波大学 (University of Tsukuba)201

    On FPGA implementations for bioinformatics, neural prosthetics and reinforcement learning problems.

    Get PDF
    Mak Sui Tung Terrence.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 132-142).Abstracts in English and Chinese.Abstract --- p.iList of Tables --- p.ivList of Figures --- p.vAcknowledgements --- p.ixChapter 1. --- Introduction --- p.1Chapter 1.1 --- Bioinformatics --- p.1Chapter 1.2 --- Neural Prosthetics --- p.4Chapter 1.3 --- Learning in Uncertainty --- p.5Chapter 1.4 --- The Field Programmable Gate Array (FPGAs) --- p.7Chapter 1.5 --- Scope of the Thesis --- p.10Chapter 2. --- A Hybrid GA-DP Approach for Searching Equivalence Sets --- p.14Chapter 2.1 --- Introduction --- p.16Chapter 2.2 --- Equivalence Set Criterion --- p.18Chapter 2.3 --- Genetic Algorithm and Dynamic Programming --- p.19Chapter 2.3.1 --- Genetic Algorithm Formulation --- p.20Chapter 2.3.2 --- Bounded Mutation --- p.21Chapter 2.3.3 --- Conditioned Crossover --- p.22Chapter 2.3.4 --- Implementation --- p.22Chapter 2.4 --- FPGAs Implementation of GA-DP --- p.24Chapter 2.4.1 --- System Overview --- p.25Chapter 2.4.2 --- Parallel Computation for Transitive Closure --- p.26Chapter 2.4.3 --- Genetic Operation Realization --- p.28Chapter 2.5 --- Discussion --- p.30Chapter 2.6 --- Limitation and Future Work --- p.33Chapter 2.7 --- Conclusion --- p.34Chapter 3. --- An FPGA-based Architecture for Maximum-Likelihood Phylogeny Evaluation --- p.35Chapter 3.1 --- Introduction --- p.36Chapter 3.2 --- Maximum-Likelihood Model --- p.39Chapter 3.3 --- Hardware Mapping for Pruning Algorithm --- p.41Chapter 3.3.1 --- Related Works --- p.41Chapter 3.3.2 --- Number Representation --- p.42Chapter 3.3.3 --- Binary Tree Representation --- p.43Chapter 3.3.4 --- Binary Tree Traversal --- p.45Chapter 3.3.5 --- Maximum-Likelihood Evaluation Algorithm --- p.46Chapter 3.4 --- System Architecture --- p.49Chapter 3.4.1 --- Transition Probability Unit --- p.50Chapter 3.4.2 --- State-Parallel Computation Unit --- p.51Chapter 3.4.3 --- Error Computation --- p.54Chapter 3.5 --- Discussion --- p.56Chapter 3.5.1 --- Hardware Resource Consumption --- p.56Chapter 3.5.2 --- Delay Evaluation --- p.57Chapter 3.6 --- Conclusion --- p.59Chapter 4. --- Field Programmable Gate Array Implementation of Neuronal Ion Channel Dynamics --- p.61Chapter 4.1 --- Introduction --- p.62Chapter 4.2 --- Background --- p.63Chapter 4.2.1 --- Analog VLSI Model for Hebbian Synapse --- p.63Chapter 4.2.2 --- A Unifying Model of Bi-directional Synaptic Plasticity --- p.64Chapter 4.2.3 --- Non-NMDA Receptor Channel Regulation --- p.65Chapter 4.3 --- FPGAs Implementation --- p.65Chapter 4.3.1 --- FPGA Design Flow --- p.65Chapter 4.3.2 --- Digital Model of NMD A and AMPA receptors --- p.65Chapter 4.3.3 --- Synapse Modification --- p.67Chapter 4.4 --- Results --- p.68Chapter 4.4.1 --- Simulation Results --- p.68Chapter 4.5 --- Discussion --- p.70Chapter 4.6 --- Conclusion --- p.71Chapter 5. --- Continuous-Time and Discrete-Time Inference Networks for Distributed Dynamic Programming --- p.72Chapter 5.1 --- Introduction --- p.74Chapter 5.2 --- Background --- p.77Chapter 5.2.1 --- Markov decision process (MDPs) --- p.78Chapter 5.2.2 --- Learning in the MDPs --- p.80Chapter 5.2.3 --- Bellman Optimal Criterion --- p.80Chapter 5.2.4 --- Value Iteration --- p.81Chapter 5.3 --- A Computational Framework for Continuous-Time Inference Network --- p.82Chapter 5.3.1 --- Binary Relation Inference Network --- p.83Chapter 5.3.2 --- Binary Relation Inference Network for MDPs --- p.85Chapter 5.3.3 --- Continuous-Time Inference Network for MDPs --- p.87Chapter 5.4 --- Convergence Consideration --- p.88Chapter 5.5 --- Numerical Simulation --- p.90Chapter 5.5.1 --- Example 1: Random Walk --- p.90Chapter 5.5.2 --- Example 2: Random Walk on a Grid --- p.94Chapter 5.5.3 --- Example 3: Stochastic Shortest Path Problem --- p.97Chapter 5.5.4 --- Relationships Between λ and γ --- p.99Chapter 5.6 --- Discrete-Time Inference Network --- p.100Chapter 5.6.1 --- Results --- p.101Chapter 5.7 --- Conclusion --- p.102Chapter 6. --- On Distributed g-Learning Network --- p.104Chapter 6.1 --- Introduction --- p.105Chapter 6.2 --- Distributed Q-Learniing Network --- p.108Chapter 6.2.1 --- Distributed Q-Learning Network --- p.109Chapter 6.2.2 --- Q-Learning Network Architecture --- p.111Chapter 6.3 --- Experimental Results --- p.114Chapter 6.3.1 --- Random Walk --- p.114Chapter 6.3.2 --- The Shortest Path Problem --- p.116Chapter 6.4 --- Discussion --- p.120Chapter 6.4.1 --- Related Work --- p.121Chapter 6.5 --- FPGAs Implementation --- p.122Chapter 6.5.1 --- Distributed Registering Approach --- p.123Chapter 6.5.2 --- Serial BRAM Storing Approach --- p.124Chapter 6.5.3 --- Comparison --- p.125Chapter 6.5.4 --- Discussion --- p.127Chapter 6.6 --- Conclusion --- p.128Chapter 7. --- Summary --- p.129Bibliography --- p.132AppendixChapter A. --- Simplified Floating-Point Arithmetic --- p.143Chapter B. --- "Logarithm, Exponential and Division Implementation" --- p.144Chapter B.1 --- Introduction --- p.144Chapter B.2 --- Approximation Scheme --- p.145Chapter B.2.1 --- Logarithm --- p.145Chapter B.2.2 --- Exponentiation --- p.147Chapter B.2.3 --- Division --- p.148Chapter C. --- Analog VLSI Implementation --- p.150Chapter C.1 --- Site Function --- p.150Chapter C.1.1 --- Multiplication Cell --- p.150Chapter C.2 --- The Unit Function --- p.153Chapter C.3 --- The Inference Network Computation --- p.154Chapter C.4 --- Layout --- p.157Chapter C.5 --- Fabrication --- p.159Chapter C.5.1 --- Testing and Characterization --- p.16

    Molecular phylogenetics: principles and practice

    Get PDF
    Phylogenies are important for addressing various biological questions such as relationships among species or genes, the origin and spread of viral infection and the demographic changes and migration patterns of species. The advancement of sequencing technologies has taken phylogenetic analysis to a new height. Phylogenies have permeated nearly every branch of biology, and the plethora of phylogenetic methods and software packages that are now available may seem daunting to an experimental biologist. Here, we review the major methods of phylogenetic analysis, including parsimony, distance, likelihood and Bayesian methods. We discuss their strengths and weaknesses and provide guidance for their use

    Phylogeny-Aware Placement and Alignment Methods for Short Reads

    Get PDF
    In recent years bioinformatics has entered a new phase: New sequencing methods, generally referred to as Next Generation Sequencing (NGS) have become widely available. This thesis introduces algorithms for phylogeny aware analysis of short sequence reads, as generated by NGS methods in the context of metagenomic studies. A considerable part of this work focuses on the technical (w.r.t. performance) challenges of these new algorithms, which have been developed specifically to exploit parallelism

    Homology sequence analysis using GPU acceleration

    Get PDF
    A number of problems in bioinformatics, systems biology and computational biology field require abstracting physical entities to mathematical or computational models. In such studies, the computational paradigms often involve algorithms that can be solved by the Central Processing Unit (CPU). Historically, those algorithms benefit from the advancements of computing power in the serial processing capabilities of individual CPU cores. However, the growth has slowed down over recent years, as scaling out CPU has been shown to be both cost-prohibitive and insecure. To overcome this problem, parallel computing approaches that employ the Graphics Processing Unit (GPU) have gained attention as complementing or replacing traditional CPU approaches. The premise of this research is to investigate the applicability of various parallel computing platforms to several problems in the detection and analysis of homology in biological sequence. I hypothesize that by exploiting the sheer amount of computation power and sequencing data, it is possible to deduce information from raw sequences without supplying the underlying prior knowledge to come up with an answer. I have developed such tools to perform analysis at scales that are traditionally unattainable with general-purpose CPU platforms. I have developed a method to accelerate sequence alignment on the GPU, and I used the method to investigate whether the Operational Taxonomic Unit (OTU) classification problem can be improved with such sheer amount of computational power. I have developed a method to accelerate pairwise k-mer comparison on the GPU, and I used the method to further develop PolyHomology, a framework to scaffold shared sequence motifs across large numbers of genomes to illuminate the structure of the regulatory network in yeasts. The results suggest that such approach to heterogeneous computing could help to answer questions in biology and is a viable path to new discoveries in the present and the future.Includes bibliographical reference

    Dynamically reconfigurable bio-inspired hardware

    Get PDF
    During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis

    Skaalautuvat laskentamenetelmät suuren kapasiteetin sekvensointidatan analytiikkaan populaatiogenomiikassa

    Get PDF
    High-throughput sequencing (HTS) technologies have enabled rapid DNA sequencing of whole-genomes collected from various organisms and environments, including human tissues, plants, soil, water, and air. As a result, sequencing data volumes have grown by several orders of magnitude, and the number of assembled whole-genomes is increasing rapidly as well. This whole-genome sequencing (WGS) data has revealed the genetic variation in humans and other species, and advanced various fields from human and microbial genomics to drug design and personalized medicine. The amount of sequencing data has almost doubled every six months, creating new possibilities but also big data challenges in genomics. Diverse methods used in modern computational biology require a vast amount of computational power, and advances in HTS technology are even widening the gap between the analysis input data and the analysis outcome. Currently, many of the existing genomic analysis tools, algorithms, and pipelines are not fully exploiting the power of distributed and high-performance computing, which in turn limits the analysis throughput and restrains the deployment of the applications to clinical practice in the long run. Thus, the relevance of harnessing distributed and cloud computing in bioinformatics is more significant than ever before. Besides, efficient data compression and storage methods for genomic data processing and retrieval integrated with conventional bioinformatics tools are essential. These vast datasets have to be stored and structured in formats that can be managed, processed, searched, and analyzed efficiently in distributed systems. Genomic data contain repetitive sequences, which is one key property in developing efficient compression algorithms to alleviate the data storage burden. Moreover, indexing compressed sequences appropriately for bioinformatics tools, such as read aligners, offers direct sequence search and alignment capabilities with compressed indexes. Relative Lempel-Ziv (RLZ) has been found to be an efficient compression method for repetitive genomes that complies with the data-parallel computing approach. RLZ has recently been used to build hybrid-indexes compatible with read aligners, and we focus on extending it with distributed computing. Data structures found in genomic data formats have properties suitable for parallelizing routine bioinformatics methods, e.g., sequence matching, read alignment, genome assembly, genotype imputation, and variant calling. Compressed indexing fused with the routine bioinformatics methods and data-parallel computing seems a promising approach to building population-scale genome analysis pipelines. Various data decomposition and transformation strategies are studied for optimizing data-parallel computing performance when such routine bioinformatics methods are executed in a complex pipeline. These novel distributed methods are studied in this dissertation and demonstrated in a generalized scalable bioinformatics analysis pipeline design. The dissertation starts from the main concepts of genomics and DNA sequencing technologies and builds routine bioinformatics methods on the principles of distributed and parallel computing. This dissertation advances towards designing fully distributed and scalable bioinformatics pipelines focusing on population genomic problems where the input data sets are vast and the analysis results are hard to achieve with conventional computing. Finally, the methods studied are applied in scalable population genomics applications using real WGS data and experimented with in a high performance computing cluster. The experiments include mining virus sequences from human metagenomes, imputing genotypes from large-scale human populations, sequence alignment with compressed pan-genomic indexes, and assembling reference genomes for pan-genomic variant calling.Suuren kapasiteetin sekvensointimenetelmät (High-Throughput Sequencing, HTS) ovat mahdollistaneet kokonaisten genomien nopean ja huokean sekvensoinnin eri organismeista ja ympäristöistä, mukaan lukien kudos-, maaperä-, vesistö- ja ilmastonäytteet. Tämän seurauksena sekvensointidatan ja koostettujen kokogenomien määrät ovat kasvaneet nopeasti. Kokogenomin sekvensointi on lisännyt ihmisen ja muiden lajien geneettisen perimän tietämystä ja edistänyt eri tieteenaloja ympäristötieteistä lääkesuunnitteluun ja yksilölliseen lääketieteeseen. Sekvensointidatan määrä on lähes kaksinkertaistunut puolivuosittain, mikä on luonut uusia mahdollisuuksia läpimurtoihin, mutta myös suuria datankäsittelyn haasteita. Nykyaikaisessa laskennallisessa biologiassa käytettävät monimutkaiset analyysimenetelmät vaativat yhä enemmän laskentatehoa HTS-datan kasvaessa, ja siksi HTS-menetelmien edistyminen kasvattaa kuilua raakadatasta lopullisiin analyysituloksiin. Useat tällä hetkellä käytetyistä genomianalyysityökaluista, algoritmeista ja ohjelmistoista eivät hyödynnä hajautetun laskennan tehoa kokonaisvaltaisesti, mikä puolestaan ​​hidastaa uusimpien analyysitulosten saamista ja rajoittaa tieteellisten ohjelmistojen käyttöönottoa kliinisessä lääketieteessä pitkällä aikavälillä. Näin ollen hajautetun ja pilvilaskennan hyödyntämisen merkitys bioinformatiikassa on tärkeämpää kuin koskaan ennen. Genomitiedon suoraa hakua ja käsittelyä tukevat pakkaus- ja tallennusmenetelmät mahdollistavat nopean ja tilatehokkaan genomianalytiikan. Uusia hajautettuihin järjestelmiin soveltuvia tietorakenteita tarvitaan, jotta näitä suuria datamääriä voidaan hallita, käsitellä, hakea ja analysoida tehokkaasti. Genomidata sisältää runsaasti toistuvia sekvenssejä, mikä on yksi keskeinen ominaisuus kehitettäessä tehokkaita pakkausalgoritmeja tiedontallennustaakkaa ja analysointia keventämään. Lisäksi pakattujen sekvenssien indeksointi yhdistettynä sekvenssilinjausmenetelmiin mahdollistaa sekvenssien satunnaishaun ja suoran linjauksen pakattuihin sekvensseihin. Relative Lempel-Ziv (RLZ) pakkausmenetelmä on todettu tehokkaaksi toistuville genomisekvensseille rinnakkaislaskentaa hyödyntäen. RLZ-menetelmää on viime aikoina sovellettu sekvenssilinjaukseen yhteensopiviin hybridi-indekseihin, joita tässä työssä on nopeutettu hajautetulla laskennalla. Genomiikan dataformaateista löytyvillä tietorakenteilla on ominaisuuksia, jotka soveltuvat hajautettuun sekvenssihakuun, sekvenssilinjaukseen, genomien koostamiseen, genotyyppien imputointiin ja varianttien havaitsemiseen. Pakattu indeksointi sovellettuna hajautetulla laskennalla tehostettuihin menetelmiin vaikuttaa lupaavalta lähestymistavalta populaatiogenomiikan analyysiohjelmistojen mukauttamiseksi suuriin datamääriin. Erilaisia ​​tiedon osittamis- ja muunnosstrategioita hyödynnetään suorituskyvyn tehostamiseen monivaiheisessa hajautetussa genomidatan prosessoinnissa. Näitä uusia skaalautuvia hajautettuja laskentamenetelmiä tutkitaan tässä väitöskirjassa ja demonstroidaan yleisluontoisella bioinformatiikan analyysiohjelmiston arkkitehtuurilla. Tässä työssä johdatellaan genomiikan ja DNA-sekvensointitekniikoiden peruskäsitteisiin ja esitellään rutiininomaisia ​​bioinformatiikan menetelmiä perustuen hajautetun ja rinnakkaislaskennan periaatteille. Väitöskirjassa edetään kohti täysin hajautettujen ja skaalautuvien bioinformatiikan ohjelmistojen suunnittelua keskittyen populaatiogenomiikan ongelmiin, joissa syötedatan määrät ovat suuria ja analyysitulosten saavuttaminen on hidasta tai jopa mahdotonta tavanomaisella laskennalla. Lopuksi tutkittuja menetelmiä sovelletaan tässä työssä kehitettyihin skaalautuviin populaatiogenomiikan sovelluksiin, joita koestetaan kokogenomidatalla supertietokoneen laskentaklusterissa. Kokeet sisältävät virussekvenssien louhintaa ihmisten metagenominäytteistä, genotyyppien täydentämistä (imputointia) suurista ihmispopulaatioista ja pan-genomisen indeksin pakkaamista sekvenssilinjauksen nopeuttamista varten. Lisäksi pakattua pan-genomia kokeillaan referenssigenomin koostamiseen populaatioon perustuvien varianttien havaitsemista varten
    corecore