11 research outputs found

    iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The iRefIndex consolidates protein interaction data from ten databases in a rigorous manner using sequence-based hash keys. Working with consolidated interaction data comes with distinct challenges: data are redundant, overlapping, highly interconnected and may be collected and represented using different curation practices. These phenomena were quantified in our previous studies.</p> <p>Results</p> <p>The iRefScape plug-in for the Cytoscape graphical viewer addresses these challenges. We show how these factors impact on data-mining tasks and how our solutions resolve them in a simple and efficient manner. A uniform accession space is used to limit redundancy and support search expansion and searching on multiple accession types. Multiple node and edge features support data filtering and mining. Node colours and features supply information about search result provenance. Overlapping evidence is presented using a multi-graph and a bi-partite representation is used to distinguish binary and n-ary source data. Searching for interactions between sets of proteins is supported and specifically includes searches on disease-related genes found in OMIM. Finally, a synchronized adjacency-matrix view facilitates visualization of relationships between sets of user defined groups.</p> <p>Conclusions</p> <p>The iRefScape plug-in will be of interest to advanced users of interaction data. The plug-in provides access to a consolidated data set in a uniform accession space while remaining faithful to the underlying source data. Tools are provided to facilitate a range of tasks from a simple search to knowledge discovery. The plug-in uses a number of strategies that will be of interest to other plug-in developers.</p

    iRefIndex: A consolidated protein interaction database with provenance

    Get PDF
    Background Interaction data for a given protein may be spread across multiple databases. We set out to create a unifying index that would facilitate searching for these data and that would group together redundant interaction data while recording the methods used to perform this grouping. Results We present a method to generate a key for a protein interaction record and a key for each participant protein. These keys may be generated by anyone using only the primary sequence of the proteins, their taxonomy identifiers and the Secure Hash Algorithm. Two interaction records will have identical keys if they refer to the same set of identical protein sequences and taxonomy identifiers. We define records with identical keys as a redundant group. Our method required that we map protein database references found in interaction records to current protein sequence records. Operations performed during this mapping are described by a mapping score that may provide valuable feedback to source interaction databases on problematic references that are malformed, deprecated, ambiguous or unfound. Keys for protein participants allow for retrieval of interaction information independent of the protein references used in the original records. Conclusion We have applied our method to protein interaction records from BIND, BioGrid, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. The resulting interaction reference index is provided in PSI-MITAB 2.5 format at http://irefindex.uio.no. This index may form the basis of alternative redundant groupings based on gene identifiers or near sequence identity groupings

    iRefIndex : Interaction Reference Index : International Conference on Intelligent Systems for Molecular Biology (ISMB), Stockholm, Sweden, June 27-July 02, 2009 : Poster E29

    No full text
    iRefIndex provides an index of protein interactions available in a number of primary interaction databases. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein. This system uses the Redundant Object Group (ROG) and Redundant Interaction Group (RIG) identifier to group proteins and interactions into redundant groups. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence and NCBI taxonomy identifier will be represented only once. iRefIndex has consolidated 10 different interaction databases using this method and provides a number of methods to access these data. This includes downloading as PSI-MI compliant tab-delimited file and from the iRefScape plugin for (http://www.cytoscape.org/), PSICQUIC web services and the iRefWeb(http://wodaklab.org/iRefWeb/). More details about iRefIndex and links available at: http://irefindex.uio.no/wiki/iRefInde

    Design and prototype of a system to integrate and visualize biological interaction data : International Conference on Intelligent Systems for Molecular Biology (ISMB), Vienna, Austria: (July 21-25, 2007) : Poster

    No full text
    Biomolecular interaction data is an increasingly important bioinformatics dataset used to examine biological systems. However, these data are spread across multiple databases and expressed in disparate data structures and formats. A prerequisite for working with these data would be consolidation into a single non-redundant updated repository. An initial design and prototype for consolidating and visualizing interaction data will be presented in this poster with an especial emphasis on providing scalable and reliable web-services as part of the solution. The central point of operation is a data warehouse with numerous parsers retrieving updated information from existing data sources. We have designed parsers for PSI-MI 1.0, PSI-MI 2.5 XML files and tab delimited text files. The parsers for XML files are built using an event-driven pull-parsing API which gives them the ability to handle very large files. A local application interface provides access to this data warehouse for local programmers, java servlets and web services. The usage of various operating systems and programming languages by the intended clients were considered when designing the system. Therefore, platform independent protocols were used. Moreover, multiple implementations of hosting the services were considered with respect to their ability to handle large data sets reliably in a stateful manner. Using these web-services, a tool was developed to retrieve and present interaction data visually. The approach taken was to construct modules as plugins to existing molecular visualization software

    The eGenVar data management system-cataloguing and sharing sensitive data and metadata for the life sciences

    No full text
    Systematic data management and controlled data sharing aim at increasing reproducibility, reducing redundancy in work, and providing a way to efficiently locate complementing or contradicting information. One method of achieving this is collecting data in a central repository or in a location that is part of a federated system and providing interfaces to the data. However, certain data, such as data from biobanks or clinical studies, may, for legal and privacy reasons, often not be stored in public repositories. Instead, we describe a metadata cataloguing system and a software suite for reporting the presence of data from the life sciences domain. The system stores three types of metadata: file information, file provenance and data lineage, and content descriptions. Our software suite includes both graphical and command line interfaces that allow users to report and tag files with these different metadata types. Importantly, the files remain in their original locations with their existing access-control mechanisms in place, while our system provides descriptions of their contents and relationships. Our system and software suite thereby provide a common framework for cataloguing and sharing both public and private data
    corecore