32 research outputs found

    Haplotype-aware Diplotyping from Noisy Long Reads

    No full text

    Phasage d’haplotypes par ASP à partir de longues lectures : une approche d’optimisation flexible

    Get PDF
    Version non corrigée. Une nouvelle version sera disponible d'ici mars 2023.Each chromosome of a di- or polyploid organism has several haplotypes, which are highly similar but diverge on a certain number of positions. However, most of the reference genomes only provide a single sequence for each chromosome, and therefore do not reflect the biological reality.Yet, it is crucial to have access to this information, which is useful in medicine, agronomy and population studies. The recent development of third generation technologies, especially PacBio and Oxford Nanopore Technologies sequencers, has allowed for the production of long reads that facilitate haplotype sequence reconstruction.Bioinformatics methods exist for this task, but they provide only a single solution. This thesis introduces an approach for haplotype phasing based on the search of connected components in a read similarity graph to identify haplotypes. This method uses Answer Set Programming to work on the set ofoptimal solutions. This phasing algorithm has been used to reconstruct haplotypes of the diploid rotifer Adineta vaga.Chaque chromosome d’organisme di- ou polyploïde présente plusieurs haplotypes, qui sont fortement similaires mais divergent sur un certain nombre de positions. Cependant, la majorité des génomes de référence ne renseignent qu’une seule séquence pour chaque chromosome, et ne reflètent donc pas la réalité biologique. Or, il est crucial d’avoir accès à ces informations, qui sont utiles en médecine, en agronomie ou encore dans l’étude des populations. Le récent développement des technologies de troisième génération, notamment des séquenceurs PacBio et Oxford NanoporeTechnologies, a permis la production de lectures longues facilitant la reconstruction des séquences d’haplotypes. Il existe pour cela des méthodes bioinformatiques, mais elles ne fournissent qu’une unique solution. Cette thèse propose une méthode de phasage d’haplotype basée sur la recherchede composantes connexes dans un graph de similarité des lectures pour identifier les haplotypes. Cette méthode utilise l’Answer Set Programming pour travailler sur l’ensemble des solutions optimales. L’algorithme de phasage a permis de reconstruire les haplotypes du rotifère diploïde Adineta vaga

    Technology dictates algorithms: Recent developments in read alignment

    Full text link
    Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies

    Computational haplotyping : theory and practice

    Get PDF
    Genomics has paved a new way to comprehend life and its evolution, and also to investigate causes of diseases and their treatment. One of the important problems in genomic analyses is haplotype assembly. Constructing complete and accurate haplotypes plays an essential role in understanding population genetics and how species evolve. In this thesis, we focus on computational approaches to haplotype assembly from third generation sequencing technologies. This involves huge amounts of sequencing data, and such data contain errors due to the single molecule sequencing protocols employed. Taking advantage of combinatorial formulations helps to correct for these errors to solve the haplotyping problem. Various computational techniques such as dynamic programming, parameterized algorithms, and graph algorithms are used to solve this problem. This thesis presents several contributions concerning the area of haplotyping. First, a novel algorithm based on dynamic programming is proposed to provide approximation guarantees for phasing a single individual. Second, an integrative approach is introduced to combining multiple sequencing datasets to generating complete and accurate haplotypes. The effectiveness of this integrative approach is demonstrated on a real human genome. Third, we provide a novel efficient approach to phasing pedigrees and demonstrate its advantages in comparison to phasing a single individual. Fourth, we present a generalized graph-based framework for performing haplotype-aware de novo assembly. Specifically, this generalized framework consists of a hybrid pipeline for generating accurate and complete haplotypes from data stemming from multiple sequencing technologies, one that provides accurate reads and other that provides long reads.Die Genomik hat neue Wege eröffnet, die es ermöglichen, die Evolution lebendiger Organismen zu verstehen, sowie die Ursachen zahlreicher Krankheiten zu erforschen und neue Therapien zu entwickeln. Ein wichtiges Problem ist die Assemblierung der Haplotypen eines Individuums. Diese Rekonstruktion von Haplotypen spielt eine zentrale Rolle für das Verständnis der Populationsgenetik und der Evolution einer Spezies. In der vorliegenden Arbeit werden Algorithmen zur Assemblierung von Haplotypen vorgestellt, die auf Sequenzierdaten der dritten Generation basieren. Dies erfordert große Mengen an Daten, welche wiederum Fehler enthalten, die die zugrunde liegenden Sequenzierprotokolle hervorbringen. Durch kombinatorische Formulierungen des Problems ist die Rekonstruktion von Haplotypen dennoch möglich, da Fehler erfolgreich korrigiert werden können. Verschiedene informatische Methoden, wie dynamische Programmierung, parametrisierte Algorithmen und Graph Algorithmen können verwendet werden, um dieses Problem zu lösen. Die vorliegende Arbeit stellt mehrere Lösungsansätze für die Rekonstruktion von Haplotypen vor. Als erstes wird ein neuartiger Algorithmus vorgestellt, der basierend auf dem Prinzip der dynamischen Programmierung Approximationsgarantien für das Haplotyping eines einzelnen Individuums liefert. Als zweites wird ein integrativer Ansatz präsentiert, um mehrere Sequenzierdatensätze zu kombinieren und somit akkurate Haplotypen zu generieren. Die Effektivität dieser Methode wird auf einem echten, menschlichen Datensatz demonstriert. Als drittes wird ein neuer, effzienter Algorithmus beschrieben, um Haplotypen verwandter Individuen simultan zu konstruieren und die Vorteile gegenüber der Betrachtung einzelner Individuen aufgezeigt. Als viertes präsentieren wir eine Graph-basierte Methode um mittels Haplotypinformation de-novo Assemblierung durchzuführen. Dieser Methode kombiniert Daten stammend von verschiedenen Sequenziertechnologien, welche entweder genaue oder aber lange Sequenzierreads liefern

    Using Simulation to Evaluate Phylogenetic Inference of Allopolyploids

    Get PDF
    Polyploidy is a biological condition in which an organism contains more than two copies of each chromosome. Polyploidy is particularly common in plants, but is found throughout eukaryotes. Allopolyploidy results from hybridization between two different species, while autoploidy results from genome duplication without hybridization. Fast and inexpensive computational methods that identify the modes of polyploidy from short-read genomic data would be useful for studies of poorly characterized organisms. Allopolyploids have been distinguished from autopolyploids in the literature using a method of constructing phylogenies of conserved loci from polyploids and closely related species (Barrier, et al., 1999). I set out to explore the limits of this methodology with short-read genomic data. Here, using simulations, I evaluate how the accuracy of this method depends on the phylogenetic branch lengths and the fidelity of haplotype estimation.Bachelor of Scienc

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

    AIRO 2016. 46th Annual Conference of the Italian Operational Research Society. Emerging Advances in Logistics Systems Trieste, September 6-9, 2016 - Abstracts Book

    Get PDF
    The AIRO 2016 book of abstract collects the contributions from the conference participants. The AIRO 2016 Conference is a special occasion for the Italian Operations Research community, as AIRO annual conferences turn 46th edition in 2016. To reflect this special occasion, the Programme and Organizing Committee, chaired by Walter Ukovich, prepared a high quality Scientific Programme including the first initiative of AIRO Young, the new AIRO poster section that aims to promote the work of students, PhD students, and Postdocs with an interest in Operations Research. The Scientific Programme of the Conference offers a broad spectrum of contributions covering the variety of OR topics and research areas with an emphasis on “Emerging Advances in Logistics Systems”. The event aims at stimulating integration of existing methods and systems, fostering communication amongst different research groups, and laying the foundations for OR integrated research projects in the next decade. Distinct thematic sections follow the AIRO 2016 days starting by initial presentation of the objectives and features of the Conference. In addition three invited internationally known speakers will present Plenary Lectures, by Gianni Di Pillo, Frédéric Semet e Stefan Nickel, gathering AIRO 2016 participants together to offer key presentations on the latest advances and developments in OR’s research
    corecore