Search CORE

Apollo (Cambridge)

Automated data integration for developmental biological research

Author: Sternberg Paul W.
Zhong Weiwei
Publication venue: 'The Company of Biologists'
Publication date: 15/09/2007
Field of study

In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research

Caltech Authors

The BioGRID Interaction Database: 2011 update

Author: A. Chatr-aryamontri
A. Winter
B.-J. Breitkreutz
Behrends
Bork
Breitkreutz
Breitkreutz
C. Stark
Cline
Costanzo
Drabkin
Hertz-Fowler
Howe
J. M. Rust
J. Nixon
K. Dolinski
K. Van Auken
Kerrien
L. Boucher
Leitner
M. S. Livstone
M. Tyers
Mering
M ller
R. Oughtred
Razick
T. Reguly
Wiederkehr
X. Shi
X. Wang
Yu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2011
Field of study

The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions

CiteSeerX

Edinburgh Research Explorer

Caltech Authors

Recommended from our members

The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases.

Author: Alliance of Genome Resources Consortium
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure

eScholarship - University of California

Reconstruction and Validation of RefRec: A Global Model for the Yeast Molecular Interaction Network

Molecular interaction networks establish all cell biological processes. The networks are under intensive research that is facilitated by new high-throughput measurement techniques for the detection, quantification, and characterization of molecules and their physical interactions. For the common model organism yeast Saccharomyces cerevisiae, public databases store a significant part of the accumulated information and, on the way to better understanding of the cellular processes, there is a need to integrate this information into a consistent reconstruction of the molecular interaction network. This work presents and validates RefRec, the most comprehensive molecular interaction network reconstruction currently available for yeast. The reconstruction integrates protein synthesis pathways, a metabolic network, and a protein-protein interaction network from major biological databases. The core of the reconstruction is based on a reference object approach in which genes, transcripts, and proteins are identified using their primary sequences. This enables their unambiguous identification and non-redundant integration. The obtained total number of different molecular species and their connecting interactions is ∼67,000. In order to demonstrate the capacity of RefRec for functional predictions, it was used for simulating the gene knockout damage propagation in the molecular interaction network in ∼590,000 experimentally validated mutant strains. Based on the simulation results, a statistical classifier was subsequently able to correctly predict the viability of most of the strains. The results also showed that the usage of different types of molecular species in the reconstruction is important for accurate phenotype prediction. In general, the findings demonstrate the benefits of global reconstructions of molecular interaction networks. With all the molecular species and their physical interactions explicitly modeled, our reconstruction is able to serve as a valuable resource in additional analyses involving objects from multiple molecular -omes. For that purpose, RefRec is freely available in the Systems Biology Markup Language format

Public Library of Science (PLOS)

Directory of Open Access Journals

Aaltodoc Publication Archive

Recommended from our members

The functional network in predictive biology : predicting phenotype from genotype and predicting human disease from fungal phenotype

Author: McGary Kriston Lyle
Publication venue
Publication date: 01/12/2008
Field of study

textThe ability to predict is one of the hallmarks of successful theories. Historically, the predictive power of biology has lagged behind disciplines like physics because the biological world is complex, challenging to quantify, and full of exceptions. However, in recent years the amount of available data has expanded exponentially and biological predictions based on this data become a possibility. The functional gene network is a quantitative way to integrate this data and a useful framework for making biological predictions. This study demonstrates that functional networks capture real biological insight and uses the network to predict both subcellular protein localization and the phenotypic outcome of gene knockouts. Furthermore, I use the functional network to evaluate genetic modules shared between diverse organisms that lead to orthologous phenotypes, many that are non-obvious. I show that the successful predictions of the functional network have broad applicability and implications that range from the design of large-scale biological experiments to the discovery of genes with potential roles in human disease.Institute for Cellular and Molecular Biolog

Texas ScholarWorks

A Genomewide Functional Network for the Laboratory Mouse

Author: A Abeliovich
AI Su
Andrey Rzhetsky
AS Siddiqui
B Fahrenkrog
B Lehner
B Snel
BJ Breitkreutz
C Alfarano
C Oka
Carol J. Bult
Chad L. Myers
CL Myers
CL Myers
DJ Watts
DP Hill
DR Rhodes
EE Schadt
EI Boyle
F Tong
H Jeong
I Chambers
I Lee
I Lee
Ihor R. Lemischka
JM Stuart
JT Eppig
JT Eppig
K Mitsui
K Xia
KE Bernstein
KI Goh
KP O'Brien
KR Brown
L Franke
L Peña-Castillo
L Salwinski
LA Boyer
M Ashburner
M Kanehisa
N Ivanova
N Novershtern
OG Troyanskaya
Olga G. Troyanskaya
P Tsaparas
R Jansen
R Setsuie
R Sharan
RA Fisher
Rong Lu
S Bandyopadhyay
S Coulomb
S Durinck
T Harata
T Jiang
TK Gandhi
U Ala
W Zhang
W Zhong
Y Chen
Y Qi
YH Loh
Yuanfang Guan
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Establishing a functional network is invaluable to our understanding of gene function, pathways, and systems-level properties of an organism and can be a powerful resource in directing targeted experiments. In this study, we present a functional network for the laboratory mouse based on a Bayesian integration of diverse genetic and functional genomic data. The resulting network includes probabilistic functional linkages among 20,581 protein-coding genes. We show that this network can accurately predict novel functional assignments and network components and present experimental evidence for predictions related to Nanog homeobox (Nanog), a critical gene in mouse embryonic stem cell pluripotency. An analysis of the global topology of the mouse functional network reveals multiple biologically relevant systems-level features of the mouse proteome. Specifically, we identify the clustering coefficient as a critical characteristic of central modulators that affect diverse pathways as well as genes associated with different phenotype traits and diseases. In addition, a cross-species comparison of functional interactomes on a genomic scale revealed distinct functional characteristics of conserved neighborhoods as compared to subnetworks specific to higher organisms. Thus, our global functional network for the laboratory mouse provides the community with a key resource for discovering protein functions and novel pathway components as well as a tool for exploring systems-level topological and evolutionary features of cellular interactomes. To facilitate exploration of this network by the biomedical research community, we illustrate its application in function and disease gene discovery through an interactive, Web-based, publicly available interface at http://mouseNET.princeton.edu

Public Library of Science (PLOS)

The Jackson Laboratory: The Mouseion at the JAXlibrary

Directory of Open Access Journals

The University of Manchester - Institutional Repository

A MOD(ern) perspective on literature curation

Author: Berardini Tanya Z.
Drabkin Harold J.
Hirschman Jodi
Howe Doug
Publication venue: Springer-Verlag
Publication date: 01/01/2010
Field of study

Curation of biological data is a multi-faceted task whose goal is to create a structured, comprehensive, integrated, and accurate resource of current biological knowledge. These structured data facilitate the work of the scientific community by providing knowledge about genes or genomes and by generating validated connections between the data that yield new information and stimulate new research approaches. For the model organism databases (MODs), an important source of data is research publications. Every published paper containing experimental information about a particular model organism is a candidate for curation. All such papers are examined carefully by curators for relevant information. Here, four curators from different MODs describe the literature curation process and highlight approaches taken by the four MODs to address: (1) the decision process by which papers are selected, and (2) the identification and prioritization of the data contained in the paper. We will highlight some of the challenges that MOD biocurators face, and point to ways in which researchers and publishers can support the work of biocurators and the value of such support

The Jackson Laboratory: The Mouseion at the JAXlibrary

An en masse phenotype and function prediction system for Mus musculus

Author: Blake Judith A
Gibbons Francis D
Hill David P
Roth Frederick P
Taşan Murat
Tian Weidong
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation. Results: Here a method previously used to predict Gene Ontology (GO) terms for Saccharomyces cerevisiae (Tian et al.: Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function. Genome Biol 2008, 9(Suppl 1):S7) is applied to predict GO terms and phenotypes for 21,603 Mus musculus genes, using a diverse collection of integrated data sources (including expression, interaction, and sequence-based data). This combined 'guilt-by-profiling' and 'guilt-by-association' approach optimizes the combination of two inference methodologies. Predictions at all levels of confidence are evaluated by examining genes not used in training, and top predictions are examined manually using available literature and knowledge base resources. Conclusion: We assigned a confidence score to each gene/term combination. The results provided high prediction performance, with nearly every GO term achieving greater than 40% precision at 1% recall. Among the 36 novel predictions for GO terms and 40 for phenotypes that were studied manually, >80% and >40%, respectively, were identified as accurate. We also illustrate that a combination of 'guilt-by-profiling' and 'guilt-by-association' outperforms either approach alone in their application to M. musculus.Molecular and Cellular Biolog

CiteSeerX

The Jackson Laboratory: The Mouseion at the JAXlibrary

Harvard University - DASH

Novel genes exhibit distinct patterns of function acquisition and network integration

Author: Capra John A
Pollard Katherine S
Singh Mona
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

BackgroundGenes are created by a variety of evolutionary processes, some of which generate duplicate copies of an entire gene, while others rearrange pre-existing genetic elements or co-opt previously non-coding sequence to create genes with 'novel' sequences. These novel genes are thought to contribute to distinct phenotypes that distinguish organisms. The creation, evolution, and function of duplicated genes are well-studied; however, the genesis and early evolution of novel genes are not well-characterized. We developed a computational approach to investigate these issues by integrating genome-wide comparative phylogenetic analysis with functional and interaction data derived from small-scale and high-throughput experiments.ResultsWe examine the function and evolution of new genes in the yeast Saccharomyces cerevisiae. We observed significant differences in the functional attributes and interactions of genes created at different times and by different mechanisms. Novel genes are initially less integrated into cellular networks than duplicate genes, but they appear to gain functions and interactions more quickly than duplicates. Recently created duplicated genes show evidence of adapting existing functions to environmental changes, while young novel genes do not exhibit enrichment for any particular functions. Finally, we found a significant preference for genes to interact with other genes of similar age and origin.ConclusionsOur results suggest a strong relationship between how and when genes are created and the roles they play in the cell. Overall, genes tend to become more integrated into the functional networks of the cell with time, but the dynamics of this process differ significantly between duplicate and novel genes