Search CORE

12,781 research outputs found

Viral population estimation using pyrosequencing

Author: A Dempster
A Rambaut
AMN Tsibris
B Gaschen
Baback Gharizadeh
C Wang
Chunlin Wang
D O'Meara
DC Douek
E Domingo
E Halperin
EH Simpson
ES Lander
Glenn Tesler
GS Gottlieb
GW Tyson
H Fakhrai-Rad
I Malet
IM Rouzine
J Kececioglu
JE Hopcroft
JF Simons
K Chen
KJ Metzner
L Bacheler
L Doukhan
L Excoffier
Lior Pachter
LR Ford
M Breitbart
M Eigen
M Margulies
M Stephens
MA Nowak
MJ Gonzales
ML Collins
ML Sogin
Mostafa Ronaghi
MT Tammi
N Beerenwinkel
Nicholas Eriksson
Niko Beerenwinkel
P Jenkins
PA Pevzner
R Schmid
R Shankarappa
Robert W. Shafer
RP Dilworth
S Huse
S-Y Rhee
S-Y Rhee
Soo-Yon Rhee
VA Johnson
Yumi Mitsuya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Should We Learn Probabilistic Models for Model Checking? A New Approach and An Empirical Study

Author: A Bauer
A Bianco
A Itai
A Mizera
C Baier
C Higuera De la
C Kermorvant
C Rohr
D Angluin
D Ron
D Tabakov
EM Clarke
EM Clarke
F He
G Norman
G Norman
HL Younes
HLS Younes
HLS Younes
I Shmulevich
JH Holland
K Havelund
K Sen
L Helmink
M Kwiatkowska
MK Reiter
RC Carrasco
RC Carrasco
T Brázdil
T Herman
Y Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Many automated system analysis techniques (e.g., model checking, model-based testing) rely on first obtaining a model of the system under analysis. System modeling is often done manually, which is often considered as a hindrance to adopt model-based system analysis and development techniques. To overcome this problem, researchers have proposed to automatically "learn" models based on sample system executions and shown that the learned models can be useful sometimes. There are however many questions to be answered. For instance, how much shall we generalize from the observed samples and how fast would learning converge? Or, would the analysis result based on the learned model be more accurate than the estimation we could have obtained by sampling many system executions within the same amount of time? In this work, we investigate existing algorithms for learning probabilistic models for model checking, propose an evolution-based approach for better controlling the degree of generalization and conduct an empirical study in order to answer the questions. One of our findings is that the effectiveness of learning may sometimes be limited.Comment: 15 pages, plus 2 reference pages, accepted by FASE 2017 in ETAP

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Open Repository and Bibliography - Luxembourg

Recent advances in inferring viral diversity from high-throughput sequencing data

Author: Beerenwinkel Niko
Posada-Cespedes Susana
Seifert David
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

Repository for Publications and Research Data

Elsevier - Publisher Connector

Applications and Challenges of Real-time Mobile DNA Analysis

Author: Ko Steven Y.
Sassoubre Lauren
Zola Jaroslaw
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/11/2017
Field of study

The DNA sequencing is the process of identifying the exact order of nucleotides within a given DNA molecule. The new portable and relatively inexpensive DNA sequencers, such as Oxford Nanopore MinION, have the potential to move DNA sequencing outside of laboratory, leading to faster and more accessible DNA-based diagnostics. However, portable DNA sequencing and analysis are challenging for mobile systems, owing to high data throughputs and computationally intensive processing performed in environments with unreliable connectivity and power. In this paper, we provide an analysis of the challenges that mobile systems and mobile computing must address to maximize the potential of portable DNA sequencing, and in situ DNA analysis. We explain the DNA sequencing process and highlight the main differences between traditional and portable DNA sequencing in the context of the actual and envisioned applications. We look at the identified challenges from the perspective of both algorithms and systems design, showing the need for careful co-design

arXiv.org e-Print Archive

Crossref

Recommended from our members

Automatic generation of test sequences form EFSM models using evolutionary algorithms

Author: Hierons RM
Kalaji AS
Swift S
Publication venue
Publication date: 01/01/2008
Field of study

Automated test data generation through evolutionary testing (ET) is a topic of interest to the software engineering community. While there are many ET-based techniques for automatically generating test data from code, the problem of generating test data from an extended finite state machine (EFSMs) is more complex and has received little attention. In this paper, we introduce a novel approach that addresses the problem of generating input test sequences that trigger given feasible paths in an EFSM model by employing an ET-based technique. The proposed approach expresses the problem as a search for input parameters to be applied to a set of functions to be called sequentially. In order to apply ET-based technique, a new fitness function is introduced to cope with the case when a test target involves calls to a set of transitions sequentially. We evaluate our approach empirically using five sets of randomly generated paths through two EFSM case studies: INRES and class 2 transport protocols. In the experiments, we apply two search techniques: a random and an ET-based which utilizes our new fitness function. Experimental results show that the proposed approach produces input test sequences that trigger all the feasible paths used with a success rate of 100%, however, the random technique failed in most cases with a success rate of 20.8%

Brunel University Research Archive

A heuristic-based approach to code-smell detection

Author: Kirk D.
Roper M.
Wood M.
Publication venue: Nova Science Publishers, Inc.
Publication date: 01/01/2007
Field of study

Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache

University of Strathclyde Institutional Repository

Bioinformatics tools for analysing viral genomic data

Author: Davison A.
Gu Q.
Hughes J.
Maabar M.
Modha S.
Orton R.J.
Vattipally Sreenu
Wilkie G.S.
Publication venue: 'O.I.E (World Organisation for Animal Health)'
Publication date: 01/04/2016
Field of study

The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing

Enlighten