Search CORE

2,005 research outputs found

A Combinatorial Approach for Single-cell Variant Detection via Phylogenetic Inference

Author: Edrisi Mohammadamin
Nakhleh Luay
Zafar Hamim
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Single-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout and non-uniform coverage. A recently introduced single-cell-specific mutation detection algorithm leverages the evolutionary relationship between cells for denoising the data. However, due to its probabilistic nature, this method does not scale well with the number of cells. Here, we develop a novel combinatorial approach for utilizing the genealogical relationship of cells in detecting mutations from noisy single-cell sequencing data. Our method, called scVILP, jointly detects mutations in individual cells and reconstructs a perfect phylogeny among these cells. We employ a novel Integer Linear Program algorithm for deterministically and efficiently solving the joint inference problem. We show that scVILP achieves similar or better accuracy but significantly better runtime over existing methods on simulated data. We also applied scVILP to an empirical human cancer dataset from a high grade serous ovarian cancer patient

Dagstuhl Research Online Publication Server

Computational Methods for Assessment and Prediction of Viral Evolutionary and Epidemiological Dynamics

Author: Mohebbi Fatemeh
Publication venue: ScholarWorks @ Georgia State University
Publication date: 31/08/2023
Field of study

The ability to comprehend the dynamics of viruses’ transmission and their evolution, even to a limited extent, can significantly enhance our capacity to predict and control the spread of infectious diseases. An example of such significance is COVID-19 caused by the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2). In this dissertation, I am proposing computational models that present more precise and comprehensive approaches in viral outbreak investigations and epidemiology, providing invaluable insights into the transmission dynamics, and potential inter- ventions of infectious diseases by facilitating the timely detection of viral variants. The first model is a mathematical framework based on population dynamics for the calculation of a numerical measure of the fitness of SARS-CoV-2 subtypes. The second model I propose here is a transmissibility estimation method based on a Bayesian approach to calculate the most likely fitness landscape for SARS-CoV-2 using a generalized logistic sub-epidemic model. Using the proposed model I estimate the epistatic interaction networks of spike protein in SARS-CoV-2. Based on the community structure of these epistatic networks, I propose a computational framework that predicts emerging haplotypes of SARS-CoV-2 with altered transmissibility. The last method proposed in this dissertation is a maximum likelihood framework that integrates phylogenetic and random graph models to accurately infer transmission networks without requiring case-specific data

ScholarWorks @ Georgia State University

Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

Author: Alessandro Tanca (494538)
Antonio Palomba (494539)
Cristina Fraumene (374294)
Edoardo Fiorillo (518797)
Francesco Cucca (145742)
Marcello Abbondio (3706183)
Sergio Uzzau (186221)
Valeria Manghina (3498188)
Publication venue
Publication date: 22/03/2019
Field of study

Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses

arXiv.org e-Print Archive

FigShare

IST Austria Technical Report

Author: Bozic Ivana
Chatterjee Krishnendu
Gerold Jeffrey
Iacobuzio-Donahue Christine
Makohon-Moore Alvin
Nowak Martin
Reiter Johannes
Vogelstein Bert
Publication venue: IST Austria
Publication date: 01/01/2015
Field of study

A comprehensive understanding of the clonal evolution of cancer is critical for understanding neoplasia. Genome-wide sequencing data enables evolutionary studies at unprecedented depth. However, classical phylogenetic methods often struggle with noisy sequencing data of impure DNA samples and fail to detect subclones that have different evolutionary trajectories. We have developed a tool, called Treeomics, that allows us to reconstruct the phylogeny of a cancer with commonly available sequencing technologies. Using Bayesian inference and Integer Linear Programming, robust phylogenies consistent with the biological processes underlying cancer evolution were obtained for pancreatic, ovarian, and prostate cancers. Furthermore, Treeomics correctly identified sequencing artifacts such as those resulting from low statistical power; nearly 7% of variants were misclassified by conventional statistical methods. These artifacts can skew phylogenies by creating illusory tumor heterogeneity among distinct samples. Importantly, we show that the evolutionary trees generated with Treeomics are mathematically optimal

IST Austria: PubRep (Institute of Science and Technology)

Uncoupled evolution of the Polycomb system and deep origin of non-canonical PRC1

Author: de Potter Bastiaan
Raas Maximilian W.D.
Seidl Michael F.
Snel Berend
Verrijzer C. Peter
Publication venue
Publication date: 10/11/2023
Field of study

Polycomb group proteins, as part of the Polycomb repressive complexes, are essential in gene repression through chromatin compaction by canonical PRC1, mono-ubiquitylation of histone H2A by non-canonical PRC1 and tri-methylation of histone H3K27 by PRC2. Despite prevalent models emphasizing tight functional coupling between PRC1 and PRC2, it remains unclear whether this paradigm indeed reflects the evolution and functioning of these complexes. Here, we conduct a comprehensive analysis of the presence or absence of cPRC1, nPRC1 and PRC2 across the entire eukaryotic tree of life, and find that both complexes were present in the Last Eukaryotic Common Ancestor (LECA). Strikingly, ~42% of organisms contain only PRC1 or PRC2, showing that their evolution since LECA is largely uncoupled. The identification of ncPRC1-defining subunits in unicellular relatives of animals and fungi suggests ncPRC1 originated before cPRC1, and we propose a scenario for the evolution of cPRC1 from ncPRC1. Together, our results suggest that crosstalk between these complexes is a secondary development in evolution.</p

EUR Research Repository

Uncoupled evolution of the Polycomb system and deep origin of non-canonical PRC1

Author: de Potter Bastiaan
Raas Maximilian W D
Seidl Michael F
Snel Berend
Verrijzer C Peter
Publication venue
Publication date: 10/11/2023
Field of study

Utrecht University Repository

Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data

Author: Chowdary Sunkara B V
Edrisi Mohammadamin
Nakhleh Luay
Ogilvie Huw A
Posada González David
Robledo Sergio
Valecha Monica Vishnu
Zafar Hamim
Publication venue: Xenómica e Biomedicina
Publication date: 27/06/2022
Field of study

Motivation: Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCI and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data. Results: Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCI in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases. Availability and implementation: Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.National Science Foundation | Ref. IIS-1812822National Science Foundation | Ref. IIS-210683

Investigo

PubMed Central

An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs.

Author: Kim Joseph
Lee Christopher
Roy Meenakshi
Wu Ying Nian
Xing Yi
Yu Tianwei
Publication venue: eScholarship, University of California
Publication date: 01/01/2006
Field of study

Reconstructing full-length transcript isoforms from sequence fragments (such as ESTs) is a major interest and challenge for bioinformatic analysis of pre-mRNA alternative splicing. This problem has been formulated as finding traversals across the splice graph, which is a directed acyclic graph (DAG) representation of gene structure and alternative splicing. In this manuscript we introduce a probabilistic formulation of the isoform reconstruction problem, and provide an expectation-maximization (EM) algorithm for its maximum likelihood solution. Using a series of simulated data and expressed sequences from real human genes, we demonstrate that our EM algorithm can correctly handle various situations of fragmentation and coupling in the input data. Our work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms

PubMed Central

eScholarship - University of California

Computational pan-genomics: status, promises and challenges

Author: Abeel Thomas
Alkan Can
Baaijens Jasmijn
Bakker Paul
Boeva Valentina
Bonnal Raoul
Chiaromonte Francesca
Chikhi Rayan
Ciccarelli Francesca
Cijvat Robin
Datema Erwin
Dijkstra Louis
Duijn Cornelia
Dutilh Bas
Eichler Evan
El-Kebir Mohammed
Ernst Corinna
Eskin Eleazar
Garrison Erik
Ghaffaari Ali
Guryev Victor
Kersey Paul
Klau Gunnar
Kloosterman Wigard
Korbel Jan
Lameijer Eric-Wubbo
Langmead Benjamin
Marschall Tobias
Martin Marcel
Marz Manja
Medvedev Paul
Mu John
Mäkinen Veli
Neerincx Pieter
Novak Adam
Ouwens Klaasjan
Paten Benedict
Peterlongo Pierre
Pisanti Nadia
Porubsky David
Rahmann Sven
Raphael Benjamin
Reinert Knut
Ridder Dick
Ridder Jeroen
Rivals Eric
Sanders Ashley
Schlesner Matthias
Schulz-Trieglaff Ole
Schönhuth Alexander
Sheikhizadeh Siavash
Shneider Carl
Smit Sandra
The Computational Pan-Genomics Consortium
Valenzuela Daniel
Vandin Fabio
Wang Jiayin
Wessels Lodewyk
Ye Kai
Zhang Ying
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

EUR Research Repository

HAL-MINES ParisTech

Archivio della ricerca della Scuola Superiore Sant'Anna