190 research outputs found
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases
<p>Abstract</p> <p>Background</p> <p>Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs.</p> <p>Results</p> <p>We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface.</p> <p>Conclusion</p> <p>We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at <url>http://www.ebi.ac.uk/Tools/picr</url>.</p
MoKCa database - mutations of kinases in cancer
Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data
Complementary Sources of Protein Functional Information: The Far Side of GO.
The GO captures many aspects of functional annotations, but there are other alternative complementary sources of protein function information. For example, enzyme functional annotations are described in a range of resources from the Enzyme Commission (E.C.) hierarchical classification to the Kyoto Encyclopedia of Genes and Genomes (KEGG) to the Catalytic Site Atlas amongst many others. This chapter describes some of the main resources available and how they can be used in conjunction with GO
Network Archaeology: Uncovering Ancient Networks from Present-day Interactions
Often questions arise about old or extinct networks. What proteins interacted
in a long-extinct ancestor species of yeast? Who were the central players in
the Last.fm social network 3 years ago? Our ability to answer such questions
has been limited by the unavailability of past versions of networks. To
overcome these limitations, we propose several algorithms for reconstructing a
network's history of growth given only the network as it exists today and a
generative model by which the network is believed to have evolved. Our
likelihood-based method finds a probable previous state of the network by
reversing the forward growth model. This approach retains node identities so
that the history of individual nodes can be tracked. We apply these algorithms
to uncover older, non-extant biological and social networks believed to have
grown via several models, including duplication-mutation with complementarity,
forest fire, and preferential attachment. Through experiments on both synthetic
and real-world data, we find that our algorithms can estimate node arrival
times, identify anchor nodes from which new nodes copy links, and can reveal
significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure
iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database
<p>Abstract</p> <p>Background</p> <p>The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment.</p> <p>Results</p> <p>The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL.</p> <p>The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis.</p> <p>Conclusions</p> <p>The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.</p
Reactome: a database of reactions, pathways and biological processes
Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice
Recommended from our members
PINOT: an intuitive resource for integrating protein-protein interactions
The past decade has seen the rise of omics data, for the understanding of biological systems in health and disease. This wealth of data includes protein-protein interaction (PPI) derived from both low and high-throughput assays, which is curated into multiple databases that capture the extent of available information from the peer-reviewed literature. Although these curation efforts are extremely useful, reliably downloading and integrating PPI data from the variety of available repositories is challenging and time consuming.
We here present a novel user-friendly web-resource called PINOT (Protein Interaction Network Online Tool; available at http://www.reading.ac.uk/bioinf/PINOT/PINOT_form.html) to optimise the collection and processing of PPI data from the IMEx consortium associated repositories (members and observers) and from WormBase for constructing, respectively, human and C. elegans PPI networks.
Users submit a query containing a list of proteins of interest for which PINOT will mine PPIs. PPI data is downloaded, merged, quality checked, and confidence scored based on the number of distinct methods and publications in which each interaction has been reported. Examples of PINOT applications are provided to highlight the performance, the ease of use and the potential applications of this tool.
PINOT is a tool that allows users to survey the literature, extracting PPI data for a list of proteins of interest. The comparison with analogous tools showed that PINOT was able to extract similar numbers of PPIs while incorporating a set of innovative features. PINOT processes both small and large queries, it downloads PPIs live through PSICQUIC and it applies quality control filters on the downloaded PPI annotations (i.e. removing the need of manual inspection by the user). PINOT provides the user with information on detection methods and publication history for each of the downloaded interaction data entry and provides results in a table format that can be easily further customised and/or directly uploaded in a network visualization software
DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila
DroID (http://droidb.org/), the Drosophila Interactions Database, is a comprehensive public resource for Drosophila gene and protein interactions. DroID contains genetic interactions and experimentally detected protein–protein interactions curated from the literature and from external databases, and predicted protein interactions based on experiments in other species. Protein interactions are annotated with experimental details and periodically updated confidence scores. Data in DroID is accessible through user-friendly, intuitive interfaces that allow simple or advanced searches and graphical visualization of interaction networks. DroID has been expanded to include interaction types that enable more complete analyses of the genetic networks that underlie biological processes. In addition to protein–protein and genetic interactions, the database now includes transcription factor–gene and regulatory RNA–gene interactions. In addition, DroID now has more gene expression data that can be used to search and filter interaction networks. Orthologous gene mappings of Drosophila genes to other organisms are also available to facilitate finding interactions based on gene names and identifiers for a number of common model organisms and humans. Improvements have been made to the web and graphical interfaces to help biologists gain a comprehensive view of the interaction networks relevant to the genes and systems that they study
Ontologies in Quantitative Biology: A Basis for Comparison, Integration, and Discovery
As biology is becoming a data-driven discipline, ontologies become increasingly important for systematically capturing the existing knowledge. This essay discusses current trends and how ontologies can also be used for discovery
Identification of functional hubs and modules by converting interactome networks into hierarchical ordering of proteins
- …