278 research outputs found
Biases in the Experimental Annotations of Protein Function and their Effect on Our Understanding of Protein Function Space
The ongoing functional annotation of proteins relies upon the work of
curators to capture experimental findings from scientific literature and apply
them to protein sequence and structure data. However, with the increasing use
of high-throughput experimental assays, a small number of experimental studies
dominate the functional protein annotations collected in databases. Here we
investigate just how prevalent is the "few articles -- many proteins"
phenomenon. We examine the experimentally validated annotation of proteins
provided by several groups in the GO Consortium, and show that the distribution
of proteins per published study is exponential, with 0.14% of articles
providing the source of annotations for 25% of the proteins in the UniProt-GOA
compilation. Since each of the dominant articles describes the use of an assay
that can find only one function or a small group of functions, this leads to
substantial biases in what we know about the function of many proteins.
Mass-spectrometry, microscopy and RNAi experiments dominate high throughput
experiments. Consequently, the functional information derived from these
experiments is mostly of the subcellular location of proteins, and of the
participation of proteins in embryonic developmental pathways. For some
organisms, the information provided by different studies overlap by a large
amount. We also show that the information provided by high throughput
experiments is less specific than those provided by low throughput experiments.
Given the experimental techniques available, certain biases in protein function
annotation due to high-throughput experiments are unavoidable. Knowing that
these biases exist and understanding their characteristics and extent is
important for database curators, developers of function annotation programs,
and anyone who uses protein function annotation data to plan experiments.Comment: Accepted to PLoS Computational Biology. Press embargo applies. v4:
text corrected for style and supplementary material inserte
Fine-scale population structure and asymmetrical dispersal in an obligate salt-marsh passerine, the Saltmarsh Sparrow (Ammodramus Caudacutus)
Understanding the spatial scale of gene flow can yield valuable insight into the ecology of an organism and guide conservation strategies. Fine-scale genetic structure is uncommon in migratory passerines because of their high vagility and presumed high dispersal abilities. Aspects of the behavior and ecology of some migratory species, however, may promote structure on a finer scale in comparison to their mobility. We investigated population genetic structure in the Saltmarsh Sparrow (Ammodramus caudacutus), a migratory passerine that breeds along the northeastern coast of the United States, where it is restricted exclusively to a narrow strip of patchily distributed tidal marsh habitat. Using genotyping with 10 microsatellite loci, we detected weak but significant population structure among Saltmarsh Sparrows from nine marshes on the breeding grounds between Scarborough, Maine, and Oceanside, New York. Genetic variation among marshes was largely consistent with a pattern of isolation by distance, with some exceptions. One inland marsh was genetically divergent despite its proximity to other sampled marshes, which suggests that mechanisms besides geographic distance influence population genetic structure. Bayesian clustering, multivariate analyses, and assignment tests supported a population structure consisting of five groups. Estimates of migration rates indicated variation in gene flow among marshes, which suggests asymmetrical dispersal and possible source-sink population dynamics. The genetic structure that we found in Saltmarsh Sparrows may result from natal philopatry and breeding-site fidelity, combined with restricted dispersal due to obligate dependence on a patchy habitat. Our findings suggest that fine-scale population structure may be important in some migratory passerines. Received 12 July 2011, accepted 1 February 2012
A Novel Interdisciplinary Course in Gerontechnology for Disseminating Computational Thinking
While specialized knowledge and skills are the hallmark of modern society, the size and complexity of contemporary problems often require cooperative effort to analyze and solve. Therefore, experiences with skills, methodologies, and tools for effective interdisciplinary collaboration and structured problem solving are vital for preparing students for future academic and professional success. Meanwhile, computational systems have permeated much of modern professional and personal life, making computational thinking an essential skill for members of modern society. However, formal training in these techniques is primarily limited to students within computer science, mathematics, management of information systems, and engineering. At Iowa State University, we have designed and offered an experimental course to develop undergraduate students’ abilities for interdisciplinary teamwork and to disseminate computational thinking skills to a broader range of students. This novel course was jointly designed and instructed by faculty from the Computer Science Department, Gerontology Program, and Graphic Design Program to incorporate diverse faculty expertise and pedagogical approaches. Students were required to interview real users to identify real-life problems, gather requirements, and assess candidate solutions, which necessitated communication both within the group and with technologically-disinclined users. In-class presentations and wiki-based project websites provided regular practice at disseminating domain expertise to larger interdisciplinary audiences. Workshops, group-based mentoring, peer learning, and guided discovery allowed non-CS majors to learn much more about computer programs and tools, and grading criteria held students individually accountable within their disciplines but also emphasized group collaboration
Multi-Magnon Scattering in the Ferromagnetic XXX-Model with Inhomogeneities
We determine the transition amplitude for multi-magnon scattering induced
through an inhomogeneous distribution of the coupling constant in the
ferromagnetic XXX-model. The two and three particle amplitudes are explicitely
calculated at small momenta. This suggests a rather plausible conjecture also
for a formula of the general n-particle amplitude.Comment: 21 pages, latex, no figure
Evolutionarily Conserved Substrate Substructures for Automated Annotation of Enzyme Superfamilies
The evolution of enzymes affects how well a species can adapt to new environmental conditions. During enzyme evolution, certain aspects of molecular function are conserved while other aspects can vary. Aspects of function that are more difficult to change or that need to be reused in multiple contexts are often conserved, while those that vary may indicate functions that are more easily changed or that are no longer required. In analogy to the study of conservation patterns in enzyme sequences and structures, we have examined the patterns of conservation and variation in enzyme function by analyzing graph isomorphisms among enzyme substrates of a large number of enzyme superfamilies. This systematic analysis of substrate substructures establishes the conservation patterns that typify individual superfamilies. Specifically, we determined the chemical substructures that are conserved among all known substrates of a superfamily and the substructures that are reacting in these substrates and then examined the relationship between the two. Across the 42 superfamilies that were analyzed, substantial variation was found in how much of the conserved substructure is reacting, suggesting that superfamilies may not be easily grouped into discrete and separable categories. Instead, our results suggest that many superfamilies may need to be treated individually for analyses of evolution, function prediction, and guiding enzyme engineering strategies. Annotating superfamilies with these conserved and reacting substructure patterns provides information that is orthogonal to information provided by studies of conservation in superfamily sequences and structures, thereby improving the precision with which we can predict the functions of enzymes of unknown function and direct studies in enzyme engineering. Because the method is automated, it is suitable for large-scale characterization and comparison of fundamental functional capabilities of both characterized and uncharacterized enzyme superfamilies
A Conditional Yeast E1 Mutant Blocks the Ubiquitin–Proteasome Pathway and Reveals a Role for Ubiquitin Conjugates in Targeting Rad23 to the Proteasome
E1 ubiquitin activating enzyme catalyzes the initial step in all ubiquitin-dependent processes. We report the isolation of uba1-204, a temperature-sensitive allele of the essential Saccharomyces cerevisiae E1 gene, UBA1. Uba1-204 cells exhibit dramatic inhibition of the ubiquitin–proteasome system, resulting in rapid depletion of cellular ubiquitin conjugates and stabilization of multiple substrates. We have employed the tight phenotype of this mutant to investigate the role ubiquitin conjugates play in the dynamic interaction of the UbL/UBA adaptor proteins Rad23 and Dsk2 with the proteasome. Although proteasomes purified from mutant cells are intact and proteolytically active, they are depleted of ubiquitin conjugates, Rad23, and Dsk2. Binding of Rad23 to these proteasomes in vitro is enhanced by addition of either free or substrate-linked ubiquitin chains. Moreover, association of Rad23 with proteasomes in mutant and wild-type cells is improved upon stabilizing ubiquitin conjugates with proteasome inhibitor. We propose that recognition of polyubiquitin chains by Rad23 promotes its shuttling to the proteasome in vivo
Retrieving sequences of enzymes experimentally characterized but erroneously annotated : the case of the putrescine carbamoyltransferase
BACKGROUND: Annotating genomes remains an hazardous task. Mistakes or gaps in such a complex process may occur when relevant knowledge is ignored, whether lost, forgotten or overlooked. This paper exemplifies an approach which could help to ressucitate such meaningful data. RESULTS: We show that a set of closely related sequences which have been annotated as ornithine carbamoyltransferases are actually putrescine carbamoyltransferases. This demonstration is based on the following points : (i) use of enzymatic data which had been overlooked, (ii) rediscovery of a short NH(2)-terminal sequence allowing to reannotate a wrongly annotated ornithine carbamoyltransferase as a putrescine carbamoyltransferase, (iii) identification of conserved motifs allowing to distinguish unambiguously between the two kinds of carbamoyltransferases, and (iv) comparative study of the gene context of these different sequences. CONCLUSIONS: We explain why this specific case of misannotation had not yet been described and draw attention to the fact that analogous instances must be rather frequent. We urge to be especially cautious when high sequence similarity is coupled with an apparent lack of biochemical information. Moreover, from the point of view of genome annotation, proteins which have been studied experimentally but are not correlated with sequence data in current databases qualify as "orphans", just as unassigned genomic open reading frames do. The strategy we used in this paper to bridge such gaps in knowledge could work whenever it is possible to collect a body of facts about experimental data, homology, unnoticed sequence data, and accurate informations about gene context
InterPro in 2017-beyond protein family and domain annotations
InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences
Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies
The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships
- …