31 research outputs found

    Assessing Reproducibility of Inherited Variants Detected With Short-Read Whole Genome Sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when \u3e 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS

    Assessing reproducibility of inherited variants detected with short-read whole genome sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30x. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.Peer reviewe

    not available

    No full text
    Neste trabalho foram abordados dois problemas: alinhamento de sequências e predição de genes. Estes dois problemas, apesar de serem distintos, tem grande relação entre si. Assim, uma proposta deste trabalho foi uma técnica, baseada em transformações de consistência, que permite a integração entre os modelos pair HMM (utilizado para alinhamento entre pares de sequências) e GHMM (utilizado em predição de genes), de maneira que as predições de genes para um conjunto de sequências genômicas sejam consistentes com o alinhamento entre essas sequências. Além disso, apresentamos dois novos algoritmos relacionados à predição de genes utilizando GHMMsnot availabl

    Structure and Dynamics of Solvent Landscapes in Charge-Transfer Reactions

    No full text

    Specific and nonspecific collapse in protein folding funnels

    No full text
    Experiments with fast folding proteins are beginning to address the relationship between collapse and folding. We investigate how different scenarios for folding can arise depending on whether the folding and collapse transitions are concurrent or whether a nonspecific collapse precedes folding. Many earlier studies have focused on the limit in which collapse is fast compared to the folding time; in this work we focus on the opposite limit where, at the folding temperature, collapse and folding occur simultaneously. Real proteins exist in both of these limits. The folding mechanism varies substantially in these two regimes. In the regime of concurrent folding and collapse, nonspecific collapse now occurs at a temperature below the folding temperature (but slightly above the glass transition temperature)

    Modeling Chikungunya control strategies and Mayaro potential outbreak in the city of Rio de Janeiro.

    No full text
    Mosquito-borne diseases have become a significant health issue in many regions around the world. For tropical countries, diseases such as Dengue, Zika, and Chikungunya, became epidemic in the last decades. Health surveillance reports during this period were crucial in providing scientific-based information to guide decision making and resources allocation to control outbreaks. In this work, we perform data analysis of the last Chikungunya epidemics in the city of Rio de Janeiro by applying a compartmental mathematical model. Sensitivity analyses were performed in order to describe the contribution of each parameter to the outbreak incidence. We estimate the "basic reproduction number" for those outbreaks and predict the potential epidemic outbreak of the Mayaro virus. We also simulated several scenarios with different public interventions to decrease the number of infected people. Such scenarios should provide insights about possible strategies to control future outbreaks

    ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data

    Get PDF
    <div><p>Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) independent and identically distributed process; (ii) variable-length Markov chain; (iii) inhomogeneous Markov chain; (iv) hidden Markov model; (v) profile hidden Markov model; (vi) pair hidden Markov model; (vii) generalized hidden Markov model; and (viii) similarity based sequence weighting. The framework includes functionality for training, simulation and decoding of the models. Additionally, it provides two methods to help parameter setting: Akaike and Bayesian information criteria (AIC and BIC). The models can be used stand-alone, combined in Bayesian classifiers, or included in more complex, multi-model, probabilistic architectures using GHMMs. In particular the framework provides a novel, flexible, implementation of decoding in GHMMs that detects when the architecture can be traversed efficiently.</p></div

    GHMM architecture for eukaryotic protein-coding gene prediction.

    No full text
    <p> is a state for representing an initial exon that ends at phase . is a state for representing an internal exon that begins at phase and ends at phase . is a state for representing a terminal exon that begins at phase . is a state for representing an intron at phase . is a state for representing intergenic regions. is a state for representing the start codon signal. is a state for representing the stop codon signal. is a state for representing acceptor splice site signal at phase . is a state for representing the donor splice site signal at phase . To model the reverse strand, we used the states that begin with the prefix ‘<i>r-</i>’. Squares with a self-transition represent states with geometric duration distribution. Squares without a self-transition represent states with a non-geometric duration distribution. Ellipses represent states with fixed-length durations.</p
    corecore