319 research outputs found

    Introducing deep learning -based methods into the variant calling analysis pipeline

    Get PDF
    Biological interpretation of the genetic variation enhances our understanding of normal and pathological phenotypes, and may lead to the development of new therapeutics. However, it is heavily dependent on the genomic data analysis, which might be inaccurate due to the various sequencing errors and inconsistencies caused by these errors. Modern analysis pipelines already utilize heuristic and statistical techniques, but the rate of falsely identified mutations remains high and variable, particular sequencing technology, settings and variant type. Recently, several tools based on deep neural networks have been published. The neural networks are supposed to find motifs in the data that were not previously seen. The performance of these novel tools is assessed in terms of precision and recall, as well as computational efficiency. Following the established best practices in both variant detection and benchmarking, the discussed tools demonstrate accuracy metrics and computational efficiency that spur further discussion

    Comparative genomics of recent adaptation in Candida pathogens

    Full text link
    [eng] Fungal infections pose a serious health threat, affecting >1,000 million people and causing ~1.5 million deaths each year. The problem is growing due to insufficient diagnostic and therapeutic options, increased number of susceptible patients, expansion of pathogens partly linked to climate change and the rise of antifungal drug resistance. Among other fungal pathogens, Candida species are a major cause of severe hospital-acquired infections, with high mortality in immunocompromised patients. Various Candida pathogens constitute a public health issue, which require further efforts to develop new drugs, optimize currently available treatments and improve diagnostics. Given the high dynamism of Candida genomes, a promising strategy to improve current therapies and diagnostics is to understand the evolutionary mechanisms of adaptation to antifungal drugs and to the human host. Previous work using in vitro evolution, population genomics, selection inferences and Genome Wide Association Studies (GWAS) have partially clarified such recent adaptation, but various open questions remain. In the three research articles that conform this PhD thesis we addressed some of these gaps from the perspective of comparative genomics. First, we addressed methodological issues regarding the analysis of Candida genomes. Studying recent adaptation in these pathogens requires adequate bioinformatic tools for variant calling, filtering and functional annotation. Among other reasons, current methods are suboptimal due to limited accuracy to identify structural variants from short read sequencing data. In addition, there is a need for easy-to-use, reproducible variant calling pipelines. To address these gaps we developed the “personalized Structural Variation detection” pipeline (perSVade), a framework to call, filter and annotate several variant types, including structural variants, directly from reads. PerSVade enables accurate identification of structural variants in any species of interest, such as Candida pathogens. In addition, our tool automatically predicts the structural variant calling accuracy on simulated genomes, which informs about the reliability of the calling process. Furthermore, perSVade can be used to analyze single nucleotide polymorphisms and copy number-variants, so that it facilitates multi-variant, reproducible genomic studies. This tool will likely boost variant analyses in Candida pathogens and beyond. Second, we addressed open questions about recent adaptation in Candida, using perSVade for variant identification. On the one hand, we investigated the evolutionary mechanisms of drug resistance in Candida glabrata. For this, we used a large-scale in vitro evolution experiment to study adaptation to two commonly-used antifungals: fluconazole and anidulafungin. Our results show rapid adaptation to one or both drugs, with moderate fitness costs and through few mutations in a narrow set of genes. In addition, we characterize a novel role of ERG3 mutations in cross-resistance towards fluconazole in anidulafungin-adapted strains. These findings illuminate the mutational paths leading to drug resistance and cross-resistance in Candida pathogens. On the other hand, we reanalyzed ~2,000 public genomes and phenotypes to understand the signs of recent selection and drug resistance in six major Candida species: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis and C. orthopsilosis. We found hundreds of genes under recent selection, suggesting that clinical adaptation is diverse and complex. These involve species-specific but also convergently affected processes, such as cell adhesion, which could underlie conserved adaptive mechanisms. In addition, using GWAS we predicted known drivers of antifungal resistance alongside potentially novel players. Furthermore, our analyses reveal an important role of generally-overlooked structural variants, and suggest an unexpected involvement of (para)sexual recombination in the spread of resistance. Taken together, our findings provide novel insights on how Candida pathogens adapt to human-related environments and suggest candidate genes that deserve future attention. In summary, the results of this thesis improve our knowledge about the mechanisms of recent adaptation in Candida pathogens, which may enable improved therapeutic and diagnostic applications.[cat] Les infeccions fúngiques representen una greu amenaça per a la salut, afectant a més de 1.000 milions de persones i causant aproximadament 1,5 milions de morts cada any. El problema està augmentant a causa d’unes opcions terapèutiques i diagnòstiques insuficients, l'increment del nombre de pacients susceptibles, l'expansió dels patògens parcialment vinculada al canvi climàtic i l'augment de la resistència als fàrmacs antifúngics. D’entre diversos fongs patògens, els llevats del gènere Candida són una causa important d'infeccions nosocomials, amb una alta mortalitat en pacients immunodeprimits. Diverses espècies de Candida constitueixen un problema de salut pública, cosa que requereix més esforços per a desenvolupar nous medicaments, optimitzar els tractaments disponibles i millorar els diagnòstics. Tenint en compte el dinamisme genòmic d’aquests patògens, una estratègia prometedora per millorar les teràpies i diagnòstics actuals és comprendre els mecanismes evolutius d'adaptació als fàrmacs antifúngics i a l’hoste humà. Treballs anteriors utilitzant l'evolució in vitro, la genòmica de poblacions, les inferències de selecció i els estudis d'associació de genoma complet (GWAS, per les sigles en anglès) han aclarit parcialment aquesta adaptació recent, però encara hi ha diverses preguntes obertes. En els tres articles que conformen aquesta tesi doctoral, hem abordat algunes d'aquestes preguntes des de la perspectiva de la genòmica comparativa. En primer lloc, hem abordat qüestions metodològiques relatives a l'anàlisi dels genomes de les espècies Candida. L'estudi de l'adaptació recent en aquests patògens requereix eines bioinformàtiques adequades per a la detecció, filtratge i anotació funcional de variants genètiques. Entre altres raons, els mètodes actuals són subòptims a causa de la limitada precisió per identificar variants estructurals a partir de dades de seqüenciació amb lectures curtes. A més, hi ha una necessitat d’eines computacionals per a la detecció de variants que siguin senzilles d'utilitzar i reproduibles. Per abordar aquestes mancances, hem desenvolupat el mètode bioinformàtic "personalized Structural Variation detection" (perSVade), una eina que permet la detecció, filtratge i anotació de diversos tipus de variants, incloent-hi les variants estructurals, directament des de les lectures. PerSVade permet la identificació precisa de les variants estructurals en qualsevol espècie d'interès, com ara els patògens Candida. A més, la nostra eina prediu automàticament la precisió de la detecció d’aquestes variants en genomes simulats, la qual cosa informa sobre la fiabilitat del procés. Finalment, perSVade es pot utilitzar per analitzar altres tipus de variants, com els polimorfismes de nucleòtid únic o els canvis en el nombre de còpies, facilitant així estudis genòmics integrals i reproduibles. Aquesta eina probablement impulsarà les anàlisis genòmiques en els patògens Candida i també en altres espècies. En segon lloc, hem abordat algunes de les preguntes obertes sobre l'adaptació recent en els llevats Candida, utilitzant perSVade per a la identificació de variants. D'una banda, hem investigat els mecanismes evolutius de resistència als fàrmacs antifúngics en Candida glabrata. Per a això, hem utilitzat un experiment d'evolució in vitro a gran escala per estudiar l'adaptació a dos antifúngics comuns: el fluconazol i l’anidulafungina. Els nostres resultats mostren una adaptació ràpida a un o ambdós fàrmacs, amb un cost per al creixement moderat i a través de poques mutacions en un nombre reduït de gens. A més, hem caracteritzat un paper nou de les mutacions en ERG3 en la resistència creuada al fluconazol en soques adaptades a anidulafungina. Aquests descobriments aclareixen els processos mutacionals que condueixen a la resistència als fàrmacs i a la resistència creuada en els patògens Candida. D'altra banda, hem re-analitzat aproximadament 2.000 genomes i fenotips disponibles en repositoris públics per a comprendre els senyals genòmics de selecció recent i de resistència a fàrmacs antifúngics, en sis espècies rellevants de Candida: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis i C. orthopsilosis. Hem trobat centenars de gens sota selecció recent, suggerint que l'adaptació clínica és diversa i complexa. Aquests gens estan relacionats amb funcions específiques de cada espècie, però també trobem processos alterats de manera similar en diferents patògens, com per exemple l’adhesió cel·lular, cosa que indica fenòmens d’adaptació conservats. A part, utilitzant GWAS hem predit mecanismes esperats de resistència a antifúngics i també possibles nous factors. A més, les nostres anàlisis revelen un paper important de les variants estructurals, generalment poc estudiades, i suggereixen una implicació inesperada de la recombinació (para)sexual en la propagació de la resistència. En conjunt, els nostres descobriments proporcionen noves perspectives sobre com els patògens Candida s'adapten als entorns humans, i suggereixen gens candidats que mereixen investigacions futures. En resum, els resultats d’aquesta tesi milloren el nostre coneixement sobre els mecanismes d'adaptació recent en els patògens Candida, cosa que pot permetre el disseny de noves teràpies i diagnòstics

    Bioinformatic approaches for genome finishing

    Get PDF
    Husemann P, Tauch A. Bioinformatic approaches for genome finishing. Bielefeld: Universitätsbibliothek Bielefeld; 2011

    Whole-genome sequence analysis for pathogen detection and diagnostics

    Get PDF
    This dissertation focuses on computational methods for improving the accuracy of commonly used nucleic acid tests for pathogen detection and diagnostics. Three specific biomolecular techniques are addressed: polymerase chain reaction, microarray comparative genomic hybridization, and whole-genome sequencing. These methods are potentially the future of diagnostics, but each requires sophisticated computational design or analysis to operate effectively. This dissertation presents novel computational methods that unlock the potential of these diagnostics by efficiently analyzing whole-genome DNA sequences. Improvements in the accuracy and resolution of each of these diagnostic tests promises more effective diagnosis of illness and rapid detection of pathogens in the environment. For designing real-time detection assays, an efficient data structure and search algorithm are presented to identify the most distinguishing sequences of a pathogen that are absent from all other sequenced genomes. Results are presented that show these "signature" sequences can be used to detect pathogens in complex samples and differentiate them from their non-pathogenic, phylogenetic near neighbors. For microarray, novel pan-genomic design and analysis methods are presented for the characterization of unknown microbial isolates. To demonstrate the effectiveness of these methods, pan-genomic arrays are applied to the study of multiple strains of the foodborne pathogen, Listeria monocytogenes, revealing new insights into the diversity and evolution of the species. Finally, multiple methods are presented for the validation of whole-genome sequence assemblies, which are capable of identifying assembly errors in even finished genomes. These validated assemblies provide the ultimate nucleic acid diagnostic, revealing the entire sequence of a genome

    From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics

    Get PDF
    Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves into a critical and constructive attitude in our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.Comment: 111 pages, 11 figures uses elsarticle latex clas

    QuASeR -- Quantum Accelerated De Novo DNA Sequence Reconstruction

    Full text link
    In this article, we present QuASeR, a reference-free DNA sequence reconstruction implementation via de novo assembly on both gate-based and quantum annealing platforms. Each one of the four steps of the implementation (TSP, QUBO, Hamiltonians and QAOA) is explained with simple proof-of-concept examples to target both the genomics research community and quantum application developers in a self-contained manner. The details of the implementation are discussed for the various layers of the quantum full-stack accelerator design. We also highlight the limitations of current classical simulation and available quantum hardware systems. The implementation is open-source and can be found on https://github.com/prince-ph0en1x/QuASeR.Comment: 24 page

    Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations

    Get PDF
    Next-generation sequencing (NGS) and mass spectrometry technologies bring unprecedented throughput, scalability and speed, facilitating the studies of biological systems. These technologies allow to sequence and analyze heterogeneous RNA populations rather than single sequences. In particular, they provide the opportunity to implement massive viral surveillance and transcriptome quantification. However, in order to fully exploit the capabilities of NGS technology we need to develop computational methods able to analyze billions of reads for assembly and characterization of sampled RNA populations. In this work we present novel computational methods for cost- and time-effective analysis of sequencing data from viral and RNA samples. In particular, we describe: i) computational methods for transcriptome reconstruction and quantification; ii) method for mass spectrometry data analysis; iii) combinatorial pooling method; iv) computational methods for analysis of intra-host viral populations
    corecore