11 research outputs found
iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex
<p>Abstract</p> <p>Background</p> <p>The iRefIndex consolidates protein interaction data from ten databases in a rigorous manner using sequence-based hash keys. Working with consolidated interaction data comes with distinct challenges: data are redundant, overlapping, highly interconnected and may be collected and represented using different curation practices. These phenomena were quantified in our previous studies.</p> <p>Results</p> <p>The iRefScape plug-in for the Cytoscape graphical viewer addresses these challenges. We show how these factors impact on data-mining tasks and how our solutions resolve them in a simple and efficient manner. A uniform accession space is used to limit redundancy and support search expansion and searching on multiple accession types. Multiple node and edge features support data filtering and mining. Node colours and features supply information about search result provenance. Overlapping evidence is presented using a multi-graph and a bi-partite representation is used to distinguish binary and n-ary source data. Searching for interactions between sets of proteins is supported and specifically includes searches on disease-related genes found in OMIM. Finally, a synchronized adjacency-matrix view facilitates visualization of relationships between sets of user defined groups.</p> <p>Conclusions</p> <p>The iRefScape plug-in will be of interest to advanced users of interaction data. The plug-in provides access to a consolidated data set in a uniform accession space while remaining faithful to the underlying source data. Tools are provided to facilitate a range of tasks from a simple search to knowledge discovery. The plug-in uses a number of strategies that will be of interest to other plug-in developers.</p
Recommended from our members
Fine Scale Natural Hazard Risk and Vulnerability Identification Informed by Climate
Although many natural disasters have hydro-meteorological antecedents, little advantage has been taken of the availability of weather and climate data, advanced diagnostics and seasonal predictions for disaster risk management. In this study, methodologies for use of hydro-meteorological data in hazard risk assessment are presented laying the ground work for future dynamic hazard predictions. A high-resolution assessment of natural hazards, vulnerability to hazards and of multihazard disaster risk has been carried out for Sri Lanka. Drought, flood, cyclone and landslide hazards, and vulnerability were identified using data from Sri Lankan government agencies. Drought and flood prone areas were mapped using rainfall data that was gridded at a resolution of 10-km. Cyclone and landslide hazardousness were mapped based on long-term historical incidence data. Indices for regional industrial development, infrastructure development and agricultural production were estimated based on proxies. An assessment of regional food insecurity from the World Food Programme was used in the analysis. Records of emergency relief were used in estimating a spatial proxy for disaster risk. A multi-hazardousness map was developed for Sri Lanka. The hazardousness estimates for drought, floods, cyclones, landslides were weighted for their associated disaster risk with proxies for economic losses to provide a risk map or a hotspots map. Our principal findings are summarized below. Useful hazard and vulnerability analysis can be carried out with the type of data that is available in-country. The hazardousness estimates for droughts, floods, cyclones and landslides show marked spatial variability. Vulnerability shows marked spatial variability as well. Thus, the resolution of analysis needs to match the resolution of spatial variations in relief, climate and other features. The higher resolution information is needed in planning and action for disaster management. Multi-hazard analysis brought out regions of high risk in Sri Lanka such as the Kegalle and Ratnapura Districts in the South West and Ampara, Batticaloa, Trincomalee, Mullaitivu and Killinochchi districts in the North-East and the districts of Nuwara Eliya, Badulla, Ampara and Matale that contain some of the sharpest hill slopes of the central mountain massifs. There is a distinct seasonality to risks posed by drought, floods, landslides and cyclones. Whereas the Eastern slopes regions have hotspots during the boreal fall and early winter, the Western slopes regions is risk prone in the summer and the early fall. Thus attention is warranted not only on Hot-Spots but also on "Hot-Seasons." Climate data was useful in estimating hazardousness in the case of droughts, floods and cyclones and for estimating flood and landslide risk. The methodologies presented here for hazard analysis of floods and droughts present an explicit link between climate and hazard. The results from this study coupled with the high-resolution seasonal climate prediction techniques developed in a related study point the way to using historical, current and predictive climate information to inform disaster management policy, and early warning systems. Climate, environmental and social change such as deforestation, urbanization and war affect the hazardousness and vulnerability. It is more difficult to quantify such changes rather than the baseline conditions. Our analysis was carried out for a period since 1960 that included a period of civil war after 1983. This war affected the North-East of the island in particular. To put things in context, while natural disasters accounted for 1,483 fatalities in this period, the civil wars accounted for over 65,000. Wars and conflict poses complications for hazard and vulnerability analysis. Yet, the vulnerabilities created by the war make such efforts to reduce disaster risks all the more important. Technical details of our work have been included in a case study published by the World Bank and in journals listed in the outputs
iRefIndex: A consolidated protein interaction database with provenance
Background
Interaction data for a given protein may be spread across multiple databases. We set out to create a unifying index that would facilitate searching for these data and that would group together redundant interaction data while recording the methods used to perform this grouping.
Results
We present a method to generate a key for a protein interaction record and a key for each participant protein. These keys may be generated by anyone using only the primary sequence of the proteins, their taxonomy identifiers and the Secure Hash Algorithm. Two interaction records will have identical keys if they refer to the same set of identical protein sequences and taxonomy identifiers. We define records with identical keys as a redundant group. Our method required that we map protein database references found in interaction records to current protein sequence records. Operations performed during this mapping are described by a mapping score that may provide valuable feedback to source interaction databases on problematic references that are malformed, deprecated, ambiguous or unfound. Keys for protein participants allow for retrieval of interaction information independent of the protein references used in the original records.
Conclusion
We have applied our method to protein interaction records from BIND, BioGrid, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. The resulting interaction reference index is provided in PSI-MITAB 2.5 format at http://irefindex.uio.no. This index may form the basis of alternative redundant groupings based on gene identifiers or near sequence identity groupings
iRefIndex : Interaction Reference Index : International Conference on Intelligent Systems for Molecular Biology (ISMB), Stockholm, Sweden, June 27-July 02, 2009 : Poster E29
iRefIndex provides an index of protein interactions available in a number of primary interaction databases. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.
This system uses the Redundant Object Group (ROG) and Redundant Interaction Group (RIG) identifier to group proteins and interactions into redundant groups. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence and NCBI taxonomy identifier will be represented only once.
iRefIndex has consolidated 10 different interaction databases using this method and provides a number of methods to access these data. This includes downloading as PSI-MI compliant tab-delimited file and from the iRefScape plugin for (http://www.cytoscape.org/), PSICQUIC web services and the iRefWeb(http://wodaklab.org/iRefWeb/).
More details about iRefIndex and links available at: http://irefindex.uio.no/wiki/iRefInde
Design and prototype of a system to integrate and visualize biological interaction data : International Conference on Intelligent Systems for Molecular Biology (ISMB), Vienna, Austria: (July 21-25, 2007) : Poster
Biomolecular interaction data is an increasingly important bioinformatics dataset used to examine biological systems. However, these data are spread across multiple databases and expressed in disparate data structures and formats. A prerequisite for working with these data would be consolidation into a single non-redundant updated repository. An initial design and prototype for consolidating and visualizing interaction data will be presented in this poster with an especial emphasis on providing scalable and reliable web-services as part of the solution.
The central point of operation is a data warehouse with numerous parsers retrieving updated information from existing data sources. We have designed parsers for PSI-MI 1.0, PSI-MI 2.5 XML files and tab delimited text files. The parsers for XML files are built using an event-driven pull-parsing API which gives them the ability to handle very large files.
A local application interface provides access to this data warehouse for local programmers, java servlets and web services. The usage of various operating systems and programming languages by the intended clients were considered when designing the system. Therefore, platform independent protocols were used. Moreover, multiple implementations of hosting the services were considered with respect to their ability to handle large data sets reliably in a stateful manner. Using these web-services, a tool was developed to retrieve and present interaction data visually. The approach taken was to construct modules as plugins to existing molecular visualization software
The eGenVar data management system-cataloguing and sharing sensitive data and metadata for the life sciences
Systematic data management and controlled data sharing aim at increasing reproducibility, reducing redundancy in work, and providing a way to efficiently locate complementing or contradicting information. One method of achieving this is collecting data in a central repository or in a location that is part of a federated system and providing interfaces to the data. However, certain data, such as data from biobanks or clinical studies, may, for legal and privacy reasons, often not be stored in public repositories. Instead, we describe a metadata cataloguing system and a software suite for reporting the presence of data from the life sciences domain. The system stores three types of metadata: file information, file provenance and data lineage, and content descriptions. Our software suite includes both graphical and command line interfaces that allow users to report and tag files with these different metadata types. Importantly, the files remain in their original locations with their existing access-control mechanisms in place, while our system provides descriptions of their contents and relationships. Our system and software suite thereby provide a common framework for cataloguing and sharing both public and private data