Search CORE

142 research outputs found

Recent advances in inferring viral diversity from high-throughput sequencing data

Author: Beerenwinkel Niko
Posada-Cespedes Susana
Seifert David
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

Repository for Publications and Research Data

Elsevier - Publisher Connector

Analysis of Next-generation Sequencing Data in Virology - Opportunities and Challenges

Author: Kale Mohan M.
Kasibhatla Sunitha M.
Kulkarni-Kale Urmila
Waman Vaishali P.
Publication venue: 'IntechOpen'
Publication date: 14/01/2016
Field of study

Viruses are the most abundant and the smallest organisms, which are relatively simple to sequence. Genome sequence data of viruses for individual species to populations outnumber that of other species. Although this offers an opportunity to study viral diversity at varying levels of taxonomic hierarchy, it also poses challenges for systematic and structured organization of data and its downstream processing. Extensive computational analyses using a number of algorithms and programs have opened exciting opportunities for virus discovery and diagnostics, apart from augmenting our understanding of the intriguing world of viruses. Unravelling evolutionary dynamics of viruses permits improved understanding of phenomena such as quasispecies diversity, role of mutations in host switching and drug resistance, which enables the tangible measurements of genotype and phenotype of viruses. Improved understanding of geno-/serotype diversity in correlation with antigenic diversity will facilitate rational design and development of efficacious vaccines against emerging and re-emerging viruses. Mathematical models developed using the genomic data could be used to predict the spread of viruses due to vector switching and the (re)emergence due to host switching and, thereby, contribute towards designing public health policies for disease management and control

IntechOpen

Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness

Author: Chang BC
Halgamuge Saman
Jayasundara Duleepa
Saeed Isaam
Tang Sen-Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

Background Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. The lack of knowledge on the number of different strains in a quasispecies population is observed to hinder the precision of existing Viral Quasispecies Spectrum Reconstruction (QSR) methods due to the uncontrolled reconstruction of a large number of in silico false positives. In this work, we formulated a novel probabilistic method for strain richness estimation specifically targeting viral quasispecies. By using this approach we improved our recently proposed spectrum reconstruction pipeline ViQuaS to achieve higher levels of precision in reconstructed quasispecies spectra without compromising the recall rates. We also discuss how one other existing popular QSR method named ShoRAH can be improved using this new approach. Results On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. Conclusions The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors

The Australian National University

Accurate Viral Population Assembly From Ultra-Deep Sequencing Data

Author: Eskin Eleazar
Mancuso Nicholas
Mangul Serghei
Sun Ren
Wu Nicholas C.
Zelikovskiy Alexander
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/06/2014
Field of study

Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation–maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads

ScholarWorks @ Georgia State University

PubMed Central

eScholarship - University of California

Algorithms for analysis of next-generation viral sequencing data

Author: Melnyk Andrii
Publication venue: ScholarWorks @ Georgia State University
Publication date: 04/05/2021
Field of study

RNA viruses mutate at extremely high rates, forming an intra-host viral population of closely related variants, which allows them to evade the host’s immune system and makes them particularly dangerous. Viral outbreaks pose a significant threat for public health. Progress of sequencing technologies made it possible to identify and sample intra-host viral populations at great depth. Consequently, the contribution of sequencing technologies to molecular surveillance of viral outbreaks becomes more and more substantial. Genome sequencing of viral populations reveals similarities between samples, allows to measure viral genetic distance and facilitate outbreak identification and isolation. Computational methods can be used to infer transmission characteristics from sequencing data. However, due to the specifics of next-generation sequencing (NGS) approaches, and the limited availability of viral data, existing methods lack accuracy and efficiency. In this dissertation, I present a novel, flexible methods, that allow tackling crucial epidemiological problems, such as identification of transmission clusters, sources of infection, and transmission direction

ScholarWorks @ Georgia State University

Algorithms for Viral Population Analysis

Author: Mancuso Nicholas
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/08/2014
Field of study

The genetic structure of an intra-host viral population has an effect on many clinically important phenotypic traits such as escape from vaccine induced immunity, virulence, and response to antiviral therapies. Next-generation sequencing provides read-coverage sufficient for genomic reconstruction of a heterogeneous, yet highly similar, viral population; and more specifically, for the detection of rare variants. Admittedly, while depth is less of an issue for modern sequencers, the short length of generated reads complicates viral population assembly. This task is worsened by the presence of both random and systematic sequencing errors in huge amounts of data. In this dissertation I present completed work for reconstructing a viral population given next-generation sequencing data. Several algorithms are described for solving this problem under the error-free amplicon (or sliding-window) model. In order for these methods to handle actual real-world data, an error-correction method is proposed. A formal derivation of its likelihood model along with optimization steps for an EM algorithm are presented. Although these methods perform well, they cannot take into account paired-end sequencing data. In order to address this, a new method is detailed that works under the error-free paired-end case along with maximum a-posteriori estimation of the model parameters

ScholarWorks @ Georgia State University

Analysis of NGS Data from Immune Response and Viral Samples

Author: Gerasimov Ekaterina
Publication venue: ScholarWorks @ Georgia State University
Publication date: 08/08/2017
Field of study

This thesis is devoted to designing and applying advanced algorithmical and statistical tools for analysis of NGS data related to cancer and infection diseases. NGS data under investigation are obtained either from host samples or viral variants. Recently, random peptide phage display libraries (RPPDL) were applied to studies of host\u27s antibody response to different diseases. We study human antibody response to breast cancer and mouse antibody response to Lyme disease by sequencing of the whole antibody repertoire profiles which are represented by RPPDL. Alternatively, instead of sequencing immune response NGS can be applied directly to a viral population within an infected host. Specifically, we analyze the following RNA viruses: the human immunodeficiency virus (HIV) and the infectious bronchitis virus (IBV). Sequencing of RNA viruses is challenging because there are many variants inside population due to high mutation rate. Our results show that NGS helps to understand RNA viruses and explore their interaction with infected hosts. NGS also helps to analyze immune response to different diseases, trace changing of immune response at different disease stages

ScholarWorks @ Georgia State University

SAMFIRE: multi-locus variant calling for time-resolved sequence data

Author: Illingworth Christopher
Publication venue: Bioinformatics
Publication date: 01/01/2013
Field of study

An increasingly common method for studying evolution is the collection of time-resolved short-read sequence data. Such datasets allow for the direct observation of rapid evolutionary processes, as might occur in natural microbial populations and in evolutionary experiments. In many circumstances, evolutionary pressure acting upon single variants can cause genomic changes at multiple nearby loci. SAMFIRE is an open-access software package for processing and analysing sequence reads from time-resolved data, calling important single- and multi-locus variants over time, identifying alleles potentially affected by selection, calculating linkage disequilibrium statistics, performing haplotype reconstruction, and exploiting time-resolved information to estimate the extent of uncertainty in reported genomic data.CI was supported by a Sir Henry Dale Fellowship, jointly funded by the Wellcome Trust and the Royal Society (Grant Number 101239/Z/13/Z).This is the author accepted manuscript. The final version is available from Oxford University Press via http://dx.doi.org/10.1093/bioinformatics/btw20

Crossref

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Apollo (Cambridge)

Full-length de novo viral quasispecies assembly through variation graph construction

Author: Baaijens J.A. (Jasmijn)
Köster J. (Johannes)
Roest B. (Bastiaan) van der
Schönhuth A. (Alexander)
Stougie L. (Leen)
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/12/2019
Field of study

CWI's Institutional Repository