12,781 research outputs found
Viral population estimation using pyrosequencing
The diversity of virus populations within single infected hosts presents a
major difficulty for the natural immune response as well as for vaccine design
and antiviral drug therapy. Recently developed pyrophosphate based sequencing
technologies (pyrosequencing) can be used for quantifying this diversity by
ultra-deep sequencing of virus samples. We present computational methods for
the analysis of such sequence data and apply these techniques to pyrosequencing
data obtained from HIV populations within patients harboring drug resistant
virus strains. Our main result is the estimation of the population structure of
the sample from the pyrosequencing reads. This inference is based on a
statistical approach to error correction, followed by a combinatorial algorithm
for constructing a minimal set of haplotypes that explain the data. Using this
set of explaining haplotypes, we apply a statistical model to infer the
frequencies of the haplotypes in the population via an EM algorithm. We
demonstrate that pyrosequencing reads allow for effective population
reconstruction by extensive simulations and by comparison to 165 sequences
obtained directly from clonal sequencing of four independent, diverse HIV
populations. Thus, pyrosequencing can be used for cost-effective estimation of
the structure of virus populations, promising new insights into viral
evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure
Should We Learn Probabilistic Models for Model Checking? A New Approach and An Empirical Study
Many automated system analysis techniques (e.g., model checking, model-based
testing) rely on first obtaining a model of the system under analysis. System
modeling is often done manually, which is often considered as a hindrance to
adopt model-based system analysis and development techniques. To overcome this
problem, researchers have proposed to automatically "learn" models based on
sample system executions and shown that the learned models can be useful
sometimes. There are however many questions to be answered. For instance, how
much shall we generalize from the observed samples and how fast would learning
converge? Or, would the analysis result based on the learned model be more
accurate than the estimation we could have obtained by sampling many system
executions within the same amount of time? In this work, we investigate
existing algorithms for learning probabilistic models for model checking,
propose an evolution-based approach for better controlling the degree of
generalization and conduct an empirical study in order to answer the questions.
One of our findings is that the effectiveness of learning may sometimes be
limited.Comment: 15 pages, plus 2 reference pages, accepted by FASE 2017 in ETAP
Recent advances in inferring viral diversity from high-throughput sequencing data
Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170
Applications and Challenges of Real-time Mobile DNA Analysis
The DNA sequencing is the process of identifying the exact order of
nucleotides within a given DNA molecule. The new portable and relatively
inexpensive DNA sequencers, such as Oxford Nanopore MinION, have the potential
to move DNA sequencing outside of laboratory, leading to faster and more
accessible DNA-based diagnostics. However, portable DNA sequencing and analysis
are challenging for mobile systems, owing to high data throughputs and
computationally intensive processing performed in environments with unreliable
connectivity and power.
In this paper, we provide an analysis of the challenges that mobile systems
and mobile computing must address to maximize the potential of portable DNA
sequencing, and in situ DNA analysis. We explain the DNA sequencing process and
highlight the main differences between traditional and portable DNA sequencing
in the context of the actual and envisioned applications. We look at the
identified challenges from the perspective of both algorithms and systems
design, showing the need for careful co-design
Recommended from our members
Automatic generation of test sequences form EFSM models using evolutionary algorithms
Automated test data generation through evolutionary testing (ET) is a topic of interest to the software engineering community. While there are many ET-based techniques for automatically generating test data from code, the problem of generating test data from an extended finite state machine (EFSMs) is more complex and has received little attention. In this paper, we introduce a novel approach that addresses the problem of generating input test sequences that trigger given feasible paths in an EFSM model by employing an ET-based technique. The proposed approach expresses the problem as a search for input parameters to be applied to a set of functions to be called sequentially. In order to apply ET-based technique, a new fitness function is introduced to cope with the case when a test target involves calls to a set of transitions sequentially. We evaluate our approach empirically using five sets of randomly generated paths through two EFSM case studies: INRES and class 2 transport protocols. In the experiments, we apply two search techniques: a random and an ET-based which utilizes our new fitness function. Experimental results show that the proposed approach produces input test sequences that trigger all the feasible paths used with a success rate of 100%, however, the random technique failed in most cases with a success rate of 20.8%
A heuristic-based approach to code-smell detection
Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache
Bioinformatics tools for analysing viral genomic data
The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing
- …