42,339 research outputs found
Of bits and bugs
Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins
HiTRACE: High-throughput robust analysis for capillary electrophoresis
Motivation: Capillary electrophoresis (CE) of nucleic acids is a workhorse
technology underlying high-throughput genome analysis and large-scale chemical
mapping for nucleic acid structural inference. Despite the wide availability of
CE-based instruments, there remain challenges in leveraging their full power
for quantitative analysis of RNA and DNA structure, thermodynamics, and
kinetics. In particular, the slow rate and poor automation of available
analysis tools have bottlenecked a new generation of studies involving hundreds
of CE profiles per experiment.
Results: We propose a computational method called high-throughput robust
analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in
large-scale nucleic acid CE analysis, including the profile alignment that has
heretofore been a rate-limiting step in the highest throughput experiments. We
illustrate the application of HiTRACE on thirteen data sets representing 4
different RNAs, three chemical modification strategies, and up to 480 single
mutant variants; the largest data sets each include 87,360 bands. By applying a
series of robust dynamic programming algorithms, HiTRACE outperforms prior
tools in terms of alignment and fitting quality, as assessed by measures
including the correlation between quantified band intensities between replicate
data sets. Furthermore, while the smallest of these data sets required 7 to 10
hours of manual intervention using prior approaches, HiTRACE quantitation of
even the largest data sets herein was achieved in 3 to 12 minutes. The HiTRACE
method therefore resolves a critical barrier to the efficient and accurate
analysis of nucleic acid structure in experiments involving tens of thousands
of electrophoretic bands.Comment: Revised to include Supplement. Availability: HiTRACE is freely
available for download at http://hitrace.stanford.ed
Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment
Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique
in bioinformatics used to infer related residues among biological sequences.
Thus alignment accuracy is crucial to a vast range of analyses, often in ways
difficult to assess in those analyses. To compare the performance of different
aligners and help detect systematic errors in alignments, a number of
benchmarking strategies have been pursued. Here we present an overview of the
main strategies--based on simulation, consistency, protein structure, and
phylogeny--and discuss their different advantages and associated risks. We
outline a set of desirable characteristics for effective benchmarking, and
evaluate each strategy in light of them. We conclude that there is currently no
universally applicable means of benchmarking MSA, and that developers and users
of alignment tools should base their choice of benchmark depending on the
context of application--with a keen awareness of the assumptions underlying
each benchmarking strategy.Comment: Revie
Essential guidelines for computational method benchmarking
In computational biology and other sciences, researchers are frequently faced
with a choice between several computational methods for performing data
analyses. Benchmarking studies aim to rigorously compare the performance of
different methods using well-characterized benchmark datasets, to determine the
strengths of each method or to provide recommendations regarding suitable
choices of methods for an analysis. However, benchmarking studies must be
carefully designed and implemented to provide accurate, unbiased, and
informative results. Here, we summarize key practical guidelines and
recommendations for performing high-quality benchmarking analyses, based on our
experiences in computational biology.Comment: Minor update
Detecting and comparing non-coding RNAs in the high-throughput era.
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data
Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao
In order to increase the efficiency of cacao tree resistance to witches¿ broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease. (Résumé d'auteur
- …