Search CORE

19 research outputs found

Biases in the Experimental Annotations of Protein Function and their Effect on Our Understanding of Protein Function Space

Author: Babbitt Patricia C.
Friedberg Iddo
Ream David C.
Schnoes Alexandra M.
Thorman Alexander W.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/04/2013
Field of study

The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here we investigate just how prevalent is the "few articles -- many proteins" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.Comment: Accepted to PLoS Computational Biology. Press embargo applies. v4: text corrected for style and supplementary material inserte

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

Internship Experiences Contribute to Confident Career Decision Making for Doctoral Students in the Life Sciences.

Author: Caliendo Anne
Dillinger Teresa
Gibeling Jeffery C
Lindstaedt Bill
McGee Richard
Morand Janice
Moses Bruce
Naffziger-Hirsch Michelle
O'Brien Theresa C
Schnoes Alexandra M
Yamamoto Keith R
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The Graduate Student Internships for Career Exploration (GSICE) program at the University of California, San Francisco (UCSF), offers structured training and hands-on experience through internships for a broad range of PhD-level careers. The GSICE program model was successfully replicated at the University of California, Davis (UC Davis). Here, we present outcome data for a total of 217 PhD students participating in the UCSF and UC Davis programs from 2010 to 2015 and 2014 to 2015, respectively. The internship programs at the two sites demonstrated comparable participation, internship completion rates, and overall outcomes. Using survey, focus group, and individual interview data, we find that the programs provide students with career development skills, while increasing students' confidence in career exploration and decision making. Internships, in particular, were perceived by students to increase their ability to discern a career area of choice and to increase confidence in pursuing that career. We present data showing that program participation does not change median time to degree and may help some trainees avoid "default postdocs." Our findings suggest important strategies for institutions developing internship programs for PhD students, namely: including a structured training component, allowing postgraduation internships, and providing a central organization point for internship programs

eScholarship - University of California

The Structure-Function Linkage Database

Author: Akiva Eyal
Almonacid Daniel E.
Babbitt Patricia C.
Barber Alan E., 2nd
Brown Shoshana
Custer Ashley F.
Ferrin Thomas E.
Hicks Michael A.
Holliday Gemma L.
Huang Conrad C.
Lauck Florian
Mashiyama Susan T.
Meng Elaine C.
Mischel David
Morris John H.
Ojha Sunil
Schnoes Alexandra M.
Stryke Doug
Yunes Jeffrey M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2013
Field of study

The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity

DSpace@MIT

PubMed Central

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%–63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with “overprediction” of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Internship Experiences Contribute to Confident Career Decision Making for Doctoral Students in the Life Sciences.

Author: Schnoes Alexandra M,
Publication venue
Publication date: 19/05/2020
Field of study

Ezid

Bridging gaps in traditional research training with iBiology Courses.

Author: Alexandra M Schnoes
Noah H Green
Ronald D Vale
Sarah S Goodwin
Shannon L Behrman
Thi A Nguyen
Publication venue: Public Library of Science (PLoS)
Publication date: 01/01/2024
Field of study

iBiology Courses provide trainees with just-in-time learning resources to become effective researchers. These courses can help scientists build core research skills, plan their research projects and careers, and learn from scientists with diverse backgrounds

Directory of Open Access Journals

Elements of iBiology Courses that contribute to student learning and meaningful interactions with mentors.

Author: Alexandra M. Schnoes (17775914)
Noah H. Green (6678647)
Ronald D. Vale (7851278)
Sarah S. Goodwin (17775917)
Shannon L. Behrman (17775920)
Thi A. Nguyen (1491040)
Publication venue
Publication date: 11/01/2024
Field of study

(01) The course lessons contain different modalities, such as videos and interactive prompts, designed to engage diverse learners and deepen their skills and knowledge. (02) By answering a series of reflective prompts throughout the course, participants create tangible plans that outline their goals, approaches, and anticipated outcomes relevant to the skills they want to develop in the lab. In each course, participants are directed to share their plans with mentors to receive feedback and guidance. (03) The courses include several ways for participants to personalize their own learning. (04) The most helpful learning components as identified by participants in surveys. BCLS, Business Concepts for Life Scientists; LE, Let’s Experiment; PYSJ, Planning Your Scientific Journey; SYR, Share Your Research.</p

FigShare

Benefits for participants who have taken iBiology Courses.

Author: Alexandra M. Schnoes (17775914)
Noah H. Green (6678647)
Ronald D. Vale (7851278)
Sarah S. Goodwin (17775917)
Shannon L. Behrman (17775920)
Thi A. Nguyen (1491040)
Publication venue
Publication date: 11/01/2024
Field of study

Benefits for participants who have taken iBiology Courses.</p

FigShare

Recommended from our members

Internship Experiences Contribute to Confident Career Decision Making for Doctoral Students in the Life Sciences.

Author: Caliendo Anne
Dillinger Teresa
Gibeling Jeffery C
Lindstaedt Bill
McGee Richard
Morand Janice
Moses Bruce
Naffziger-Hirsch Michelle
O'Brien Theresa C
Schnoes Alexandra M
Yamamoto Keith R
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

eScholarship - University of California