Search CORE

722 research outputs found

Computational pan-genomics: status, promises and challenges

Author: Abeel Thomas
Alkan Can
Baaijens Jasmijn
Bakker Paul
Boeva Valentina
Bonnal Raoul
Chiaromonte Francesca
Chikhi Rayan
Ciccarelli Francesca
Cijvat Robin
Datema Erwin
Dijkstra Louis
Duijn Cornelia
Dutilh Bas
Eichler Evan
El-Kebir Mohammed
Ernst Corinna
Eskin Eleazar
Garrison Erik
Ghaffaari Ali
Guryev Victor
Kersey Paul
Klau Gunnar
Kloosterman Wigard
Korbel Jan
Lameijer Eric-Wubbo
Langmead Benjamin
Marschall Tobias
Martin Marcel
Marz Manja
Medvedev Paul
Mu John
Mäkinen Veli
Neerincx Pieter
Novak Adam
Ouwens Klaasjan
Paten Benedict
Peterlongo Pierre
Pisanti Nadia
Porubsky David
Rahmann Sven
Raphael Benjamin
Reinert Knut
Ridder Dick
Ridder Jeroen
Rivals Eric
Sanders Ashley
Schlesner Matthias
Schulz-Trieglaff Ole
Schönhuth Alexander
Sheikhizadeh Siavash
Shneider Carl
Smit Sandra
The Computational Pan-Genomics Consortium
Valenzuela Daniel
Vandin Fabio
Wang Jiayin
Wessels Lodewyk
Ye Kai
Zhang Ying
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

EUR Research Repository

HAL-MINES ParisTech

Archivio della ricerca della Scuola Superiore Sant'Anna

Radboud Repository

HAL-Rennes 1

Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies

Author
Publication venue: BioMed Central
Publication date: 22/09/2015
Field of study

Springer - Publisher Connector

Computational Methods for Gene Expression and Genomic Sequence Analysis

Author: Vo Nam Sy
Publication venue: University of Memphis Digital Commons
Publication date: 19/07/2016
Field of study

Advances in technologies currently produce more and more cost-effective, high-throughput, and large-scale biological data. As a result, there is an urgent need for developing efficient computational methods for analyzing these massive data. In this dissertation, we introduce methods to address several important issues in gene expression and genomic sequence analysis, two of the most important areas in bioinformatics.Firstly, we introduce a novel approach to predicting patterns of gene response to multiple treatments in case of small sample size. Researchers are increasingly interested in experiments with many treatments such as chemicals compounds or drug doses. However, due to cost, many experiments do not have large enough samples, making it difficult for conventional methods to predict patterns of gene response. Here we introduce an approach which exploited dependencies of pairwise comparisons outcomes and resampling techniques to predict true patterns of gene response in case of insufficient samples. This approach deduced more and better functionally enriched gene clusters than conventional methods. Our approach is therefore useful for multiple-treatment studies which have small sample size or contain highly variantly expressed genes.Secondly, we introduce a novel method for aligning short reads, which are DNA fragments extracted across genomes of individuals, to reference genomes. Results from short read alignment can be used for many studies such as measuring gene expression or detecting genetic variants. Here we introduce a method which employed an iterated randomized algorithm based on FM-index, an efficient data structure for full-text indexing, to align reads to the reference. This method improved alignment performance across a wide range of read lengths and error rates compared to several popular methods, making it a good choice for community to perform short read alignment.Finally, we introduce a novel approach to detecting genetic variants such as SNPs (single nucleotide polymorphisms) or INDELs (insertions/deletions). This study has great significance in a wide range of areas, from bioinformatics and genetic research to medical field. For example, one can predict how genomic changes are related to phenotype in their organism of interest, or associate genetic changes to disease risk or medical treatment efficacy. Here we introduce a method which leveraged known genetic variants existing in well-established databases to improve accuracy of detecting variants. This method had higher accuracy than several state-of-the-art methods in many cases, especially for detecting INDELs. Our method therefore has potential to be useful in research and clinical applications which rely on identifying genetic variants accurately

University of Memphis Digital Commons

Computational pan-genomics: status, promises and challenges

Author
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

University of Groningen

Computational pan-genomics: status, promises and challenges

Author
Publication venue
Publication date: 01/01/2018
Field of study

ARTS repository - University of Groningen

Computational pan-genomics: status, promises and challenges

Author
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

ARTS repository - University of Groningen

Enhanced mitochondrial genome analysis: bioinformatic and long-read sequencing advances and their diagnostic implications

Author: Bugiardini E
Chandler N
Chitty LS
Falabella M
Hanna MG
Labrum R
Macken WL
Pitceathly RDS
Pizzamiglio C
Polke JM
Scotchman E
Vandrovcova J
Woodward CE
Publication venue: 'Informa UK Limited'
Publication date: 29/08/2023
Field of study

Introduction: Primary mitochondrial diseases (PMDs) comprise a large and heterogeneous group of genetic diseases that result from pathogenic variants in either nuclear DNA (nDNA) or mitochondrial DNA (mtDNA). Widespread adoption of next-generation sequencing (NGS) has improved the efficiency and accuracy of mtDNA diagnoses; however, several challenges remain. Areas covered: In this review, we briefly summarize the current state of the art in molecular diagnostics for mtDNA and consider the implications of improved whole genome sequencing (WGS), bioinformatic techniques, and the adoption of long-read sequencing, for PMD diagnostics. Expert opinion: We anticipate that the application of PCR-free WGS from blood DNA will increase in diagnostic laboratories, while for adults with myopathic presentations, WGS from muscle DNA may become more widespread. Improved bioinformatic strategies will enhance WGS data interrogation, with more accurate delineation of mtDNA and NUMTs (nuclear mitochondrial DNA segments) in WGS data, superior coverage uniformity, indirect measurement of mtDNA copy number, and more accurate interpretation of heteroplasmic large-scale rearrangements (LSRs). Separately, the adoption of diagnostic long-read sequencing could offer greater resolution of complex LSRs and the opportunity to phase heteroplasmic variants

UCL Discovery