82 research outputs found

    Statistical physics methods in computational biology

    Get PDF
    The interest of statistical physics for combinatorial optimization is not new, it suffices to think of a famous tool as simulated annealing. Recently, it has also resorted to statistical inference to address some "hard" optimization problems, developing a new class of message passing algorithms. Three applications to computational biology are presented in this thesis, namely: 1) Boolean networks, a model for gene regulatory networks; 2) haplotype inference, to study the genetic information present in a population; 3) clustering, a general machine learning tool

    Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies

    Get PDF
    Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5‐kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treate

    Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies

    Get PDF
    Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5‐kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated

    ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated.</p> <p>Results</p> <p>We developed ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. The software was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.</p> <p>Conclusions</p> <p>ShoRAH is implemented in C++, Python, and Perl and has been tested under Linux and Mac OS X. Source code is available under the GNU General Public License at <url>http://www.cbg.ethz.ch/software/shorah</url>.</p

    Finite size corrections to random Boolean networks

    Full text link
    Since their introduction, Boolean networks have been traditionally studied in view of their rich dynamical behavior under different update protocols and for their qualitative analogy with cell regulatory networks. More recently, tools borrowed from statistical physics of disordered systems and from computer science have provided a more complete characterization of their equilibrium behavior. However, the largest part of the results have been obtained in the thermodynamic limit, which is often far from being reached when dealing with realistic instances of the problem. The numerical analysis presented here aims at comparing - for a specific family of models - the outcomes given by the heuristic belief propagation algorithm with those given by exhaustive enumeration. In the second part of the paper some analytical considerations on the validity of the annealed approximation are discussed.Comment: Minor correction

    Prevalence and Predictors for Homo- and Heterosubtypic Antibodies Against Influenza A Virus

    Get PDF
    Heterosubtypic antibodies to influenza A virus will be crucial for the development of a pan-influenza vaccine. Here we show that most individuals already possess heterosubtypic antibodies and that their generation is favored both by vaccination and ag

    ozagordi/MinVar: Support for non-overlapping amplicons

    No full text
    MinVar now also works with non-overlapping amplicons. With this update, the program takes the first and the last position that is covered by 20 reads at least. In the previous version, problems were observed on HIV that was sequenced with two non-overlapping amplicons. For a similar reason also a drop in the coverage caused the premature end of variant calling. This was observed, for example, on HCV samples
    corecore