82 research outputs found
Statistical physics methods in computational biology
The interest of statistical physics for combinatorial optimization is not new, it suffices to think of a famous tool as
simulated annealing. Recently, it has also resorted to statistical inference to address some "hard" optimization problems, developing a new class of message passing algorithms. Three applications to computational biology are presented in this thesis, namely:
1) Boolean networks, a model for gene regulatory networks;
2) haplotype inference, to study the genetic information present in a population;
3) clustering, a general machine learning tool
Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies
Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5‐kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treate
Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies
Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5‐kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated
ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data
<p>Abstract</p> <p>Background</p> <p>With next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated.</p> <p>Results</p> <p>We developed ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. The software was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.</p> <p>Conclusions</p> <p>ShoRAH is implemented in C++, Python, and Perl and has been tested under Linux and Mac OS X. Source code is available under the GNU General Public License at <url>http://www.cbg.ethz.ch/software/shorah</url>.</p
Finite size corrections to random Boolean networks
Since their introduction, Boolean networks have been traditionally studied in
view of their rich dynamical behavior under different update protocols and for
their qualitative analogy with cell regulatory networks. More recently, tools
borrowed from statistical physics of disordered systems and from computer
science have provided a more complete characterization of their equilibrium
behavior. However, the largest part of the results have been obtained in the
thermodynamic limit, which is often far from being reached when dealing with
realistic instances of the problem. The numerical analysis presented here aims
at comparing - for a specific family of models - the outcomes given by the
heuristic belief propagation algorithm with those given by exhaustive
enumeration. In the second part of the paper some analytical considerations on
the validity of the annealed approximation are discussed.Comment: Minor correction
Prevalence and Predictors for Homo- and Heterosubtypic Antibodies Against Influenza A Virus
Heterosubtypic antibodies to influenza A virus will be crucial for the development of a pan-influenza vaccine. Here we show that most individuals already possess heterosubtypic antibodies and that their generation is favored both by vaccination and ag
Optimization and validation of sample preparation for metagenomic sequencing of viruses in clinical samples
Demultiplexed raw sequencing data files (fastq.gz
ozagordi/MinVar: Support for non-overlapping amplicons
MinVar now also works with non-overlapping amplicons.
With this update, the program takes the first and the last position that is covered by 20 reads at least. In the previous version, problems were observed on HIV that was sequenced with two non-overlapping amplicons.
For a similar reason also a drop in the coverage caused the premature end of variant calling. This was observed, for example, on HCV samples
- …