3,015 research outputs found
Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly
Motivation: Eugene Myers in his string graph paper (Myers, 2005) suggested
that in a string graph or equivalently a unitig graph, any path spells a valid
assembly. As a string/unitig graph also encodes every valid assembly of reads,
such a graph, provided that it can be constructed correctly, is in fact a
lossless representation of reads. In principle, every analysis based on
whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion
(INDEL) calling, can also be achieved with unitigs.
Results: To explore the feasibility of using de novo assembly in the context
of resequencing, we developed a de novo assembler, fermi, that assembles
Illumina short reads into unitigs while preserving most of information of the
input reads. SNPs and INDELs can be called by mapping the unitigs against a
reference genome. By applying the method on 35-fold human resequencing data, we
showed that in comparison to the standard pipeline, our approach yields similar
accuracy for SNP calling and better results for INDEL calling. It has higher
sensitivity than other de novo assembly based methods for variant calling. Our
work suggests that variant calling with de novo assembly be a beneficial
complement to the standard variant calling pipeline for whole-genome
resequencing. In the methodological aspects, we proposed FMD-index for
forward-backward extension of DNA sequences, a fast algorithm for finding all
super-maximal exact matches and one-pass construction of unitigs from an
FMD-index.
Availability: http://github.com/lh3/fermi
Contact: [email protected]: Rev2: submitted version with minor improvements; 7 page
Rotational Correction on the Morse Potential Through the Pekeris Approximation and Nikiforov-Uvarov Method
The Nikiforov-Uvarov method is employed to calculate the the Schrodinger
equation with a rotation Morse potential. The bound state energy eigenvalues
and the corresponding eigenfunction are obtained. All of these calculation
present an effective and clear method under a Pekeris approximation to solve a
rotation Morse model. Meanwhile the results got here are in a good agreement
with ones before.Comment: 11 pages, no figure, submitted to Chemical Physics Letters, (2005
Performance analysis of a parallel, multi-node pipeline for DNA sequencing
Post-sequencing DNA analysis typically consists of read mapping followed by variant calling and is very time-consuming, even on a multi-core machine. Recently, we proposed Halvade, a parallel, multi-node implementation of a DNA sequencing pipeline according to the GATK Best Practices recommendations. The MapReduce programming model is used to distribute the workload among different workers. In this paper, we study the impact of different hardware configurations on the performance of Halvade. Benchmarks indicate that especially the lack of good multithreading capabilities in the existing tools (BWA, SAMtools, Picard, GATK) cause suboptimal scaling behavior. We demonstrate that it is possible to circumvent this bottleneck by using multiprocessing on high-memory machines rather than using multithreading. Using a 15-node cluster with 360 CPU cores in total, this results in a runtime of 1 h 31 min. Compared to a single-threaded runtime of similar to 12 days, this corresponds to an overall parallel efficiency of 53%
Soluble oligomerization provides a beneficial fitness effect on destabilizing mutations
Mutations create the genetic diversity on which selective pressures can act,
yet also create structural instability in proteins. How, then, is it possible
for organisms to ameliorate mutation-induced perturbations of protein stability
while maintaining biological fitness and gaining a selective advantage? Here we
used a new technique of site-specific chromosomal mutagenesis to introduce a
selected set of mostly destabilizing mutations into folA - an essential
chromosomal gene of E. coli encoding dihydrofolate reductase (DHFR) - to
determine how changes in protein stability, activity and abundance affect
fitness. In total, 27 E.coli strains carrying mutant DHFR were created. We
found no significant correlation between protein stability and its catalytic
activity nor between catalytic activity and fitness in a limited range of
variation of catalytic activity observed in mutants. The stability of these
mutants is strongly correlated with their intracellular abundance; suggesting
that protein homeostatic machinery plays an active role in maintaining
intracellular concentrations of proteins. Fitness also shows a significant
correlation with intracellular abundance of soluble DHFR in cells growing at
30oC. At 42oC, on the other hand, the picture was mixed, yet remarkable: a few
strains carrying mutant DHFR proteins aggregated rendering them nonviable, but,
intriguingly, the majority exhibited fitness higher than wild type. We found
that mutational destabilization of DHFR proteins in E. coli is counterbalanced
at 42oC by their soluble oligomerization, thereby restoring structural
stability and protecting against aggregation
Fluconazole Monotherapy Is a Suboptimal Option for Initial Treatment of Cryptococcal Meningitis Because of Emergence of Resistance.
Cryptococcal meningitis is a lethal disease with few therapeutic options. Induction therapy with fluconazole has been consistently demonstrated to be associated with suboptimal microbiological and clinical outcomes. Exposure to fluconazole causes dynamic changes in antifungal susceptibility, which are associated with the development of aneuploidy. The implications of this phenomenon for pharmacodynamics of fluconazole for cryptococcal meningitis are poorly understood. The pharmacodynamics of fluconazole were studied using a hollow-fiber infection model (HFIM) and a well-characterized murine model of cryptococcal meningoencephalitis. The relationship between drug exposure and both antifungal killing and the emergence of resistance was quantified. The same relationships were further evaluated in a recently described group of patients with cryptococcal meningitis undergoing induction therapy with fluconazole at 800 to 1,200 mg/day. The pattern of emergence of fluconazole resistance followed an "inverted U." Resistance amplification was maximal and suppressed at ratios of the area under the concentration-time curve for the free, unbound fraction of the drug to the MIC (fAUC:MIC) of 34.5 to 138 and 305.6, respectively. Emergence of resistance was observed in vivo with an fAUC:MIC of 231.4. Aneuploidy with duplication of chromosome 1 was demonstrated to be the underlying mechanism in both experimental models. The pharmacokinetic (PK)-pharmacodynamic model accurately described the PK, antifungal killing, and emergence of resistance. Monte Carlo simulations from the clinical pharmacokinetic-pharmacodynamic model showed that only 12.8% of simulated patients receiving fluconazole at 1,200 mg/day achieved sterilization of the cerebrospinal fluid (CSF) after 2 weeks and that 83.4% had a persistent subpopulation that was resistant to fluconazole. Fluconazole is primarily ineffective due to the emergence of resistance. Treatment with 1,200 mg/day leads to the killing of a susceptible subpopulation but is compromised by the emergence of resistance.IMPORTANCE Cryptococcal meningitis is a lethal disease with few treatment options. The incidence remains high and intricately linked with the HIV/AIDS epidemic. In many parts of the world, fluconazole is the only agent that is available for the initial treatment of cryptococcal meningitis despite considerable evidence that it is associated with suboptimal microbiological and clinical outcomes. Fluconazole has a fungistatic mode of action: it predominantly inhibits growth rather than causing fungal killing. Our work shows that the pattern of fluconazole activity is caused by the emergence of resistance in Cryptococcus not detected by standard susceptibility tests, with chromosomal duplication/aneuploidy as the main mechanism. Resistance emergence is related to drug exposure and occurs with the use of clinically relevant regimens. Hence, fluconazole (and potentially other agents that target 14-alpha-demethylase) is compromised by an intrinsic property that limits its effectiveness. However, this resistance may be potentially overcome by dosage escalation or the use of combination therapy
Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF <5%), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling
Workshop—Predicting the Structure of Biological Molecules
This April, in Cambridge (UK), principal investigators from the Mathematical
Biology Group of the Medical Research Council's National Institute of Medical
Research organized a workshop in structural bioinformatics at the Centre for
Mathematical Sciences. Bioinformatics researchers of several nationalities from
labs around the country presented and discussed their computational work in
biomolecular structure prediction and analysis, and in protein evolution. The meeting
was intensive and lively and gave attendees an overview of the healthy state of protein
bioinformatics in the UK
Illuminating Choices for Library Prep: A Comparison of Library Preparation Methods for Whole Genome Sequencing of Cryptococcus neoformans Using Illumina HiSeq.
The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use
Epistasis not needed to explain low dN/dS
An important question in molecular evolution is whether an amino acid that
occurs at a given position makes an independent contribution to fitness, or
whether its effect depends on the state of other loci in the organism's genome,
a phenomenon known as epistasis. In a recent letter to Nature, Breen et al.
(2012) argued that epistasis must be "pervasive throughout protein evolution"
because the observed ratio between the per-site rates of non-synonymous and
synonymous substitutions (dN/dS) is much lower than would be expected in the
absence of epistasis. However, when calculating the expected dN/dS ratio in the
absence of epistasis, Breen et al. assumed that all amino acids observed in a
protein alignment at any particular position have equal fitness. Here, we relax
this unrealistic assumption and show that any dN/dS value can in principle be
achieved at a site, without epistasis. Furthermore, for all nuclear and
chloroplast genes in the Breen et al. dataset, we show that the observed dN/dS
values and the observed patterns of amino acid diversity at each site are
jointly consistent with a non-epistatic model of protein evolution.Comment: This manuscript is in response to "Epistasis as the primary factor in
molecular evolution" by Breen et al. Nature 490, 535-538 (2012
- …