77 research outputs found

    Analysis and visualisation of RDF resources in Ondex

    Get PDF
    An increasing number of biomedical resources provide their information on the Semantic Web and this creates the basis for a distributed knowledge base which has the potential to advance biomedical research [1]. This potential, however, cannot be realized until researchers from the life sciences can interact with information in the Semantic Web. In particular, there is a need for tools that provide data reduction, visualization and interactive analysis capabilities.
Ondex is a data integration and visualization platform developed to support Systems Biology Research [2]. At its core is a data model based on two main principles: first, all information can be represented as a graph and, second, all elements of the graph can be annotated with ontologies. This data model conforms to the Semantic Web framework, in particular to RDF, and therefore Ondex is ideally positioned as a platform that can exploit the semantic web. 
The Ondex system offers a range of features and analysis methods of potential value to semantic web users, including:
-	An interactive graph visualization interface (Ondex user client), which provides data reduction and representation methods that leverage the ontological annotation.
-	A suite of importers from a variety of data sources to Ondex (http://ondex.org/formats.html)
-	A collection of plug-ins which implement graph analysis, graph transformation and graph-matching functions.
-	An integration toolkit (Ondex Integrator) which allows users to compose workflows from these modular components
-	In addition, all importers and plug-ins are available as web-services which can be integrated in other tools, as for instance Taverna [3].
The developments that will be presented in this demo have made this functionality interoperable with the Semantic Web framework. In particular we have developed an interactive importer, based on SPARQL that allows the query-driven construction of datasets which brings together information from different RDF data resources into Ondex.
These datasets can then be further refined, analysed and annotated both interactively using the Ondex user client and via user-defined workflows. The results of these analyses can be exported in RDF, which can be used to enrich existent knowledge bases, or to provide application-specific views of the data. Both importer and exporter only focus on a subset of the Ondex and RDF data models, which are shared between these two data representations [4].
In this demo we will show how Ondex can be used to query, analyse and visualize Semantic Web knowledge bases. In particular we will present real use cases focused, but not limited to, resources relevant to plant biology. 
We believe that Ondex can be a valid contribution to the adoption of the Semantic Web in Systems Biology research and in biomedical investigation more generally. We welcome feedback on our current import/export prototype and suggestions for the advancement of Ondex for the Semantic Web.

References

1.	Ruttenberg, A. et. al.: Advancing translational research with the Semantic Web, BMC Bioinformatics, 8 (Suppl. 3): S2 (2007).
2.	Köhler, J., Baumbach, J., Taubert, J., Specht, M., Skusa, A., Ruegg, A., Rawlings, C., Verrier, P., Philippi, S.: Graph-based analysis and visualization of experimental results with Ondex. Bioinformatics 22 (11):1383-1390 (2006).
3.	Rawlings, C.: Semantic Data Integration for Systems Biology Research, Technology Track at ISMB’09, http://www.iscb.org/uploaded/css/36/11846.pdf (2009).
4.	Splendiani, A. et. al.: Ondex semantic definition, (Web document) http://ondex.svn.sourceforge.net/viewvc/ondex/trunk/doc/semantics/ (2009).
&#xa

    Lost in translation: data integration tools meet the Semantic Web (experiences from the Ondex project)

    Full text link
    More information is now being published in machine processable form on the web and, as de-facto distributed knowledge bases are materializing, partly encouraged by the vision of the Semantic Web, the focus is shifting from the publication of this information to its consumption. Platforms for data integration, visualization and analysis that are based on a graph representation of information appear first candidates to be consumers of web-based information that is readily expressible as graphs. The question is whether the adoption of these platforms to information available on the Semantic Web requires some adaptation of their data structures and semantics. Ondex is a network-based data integration, analysis and visualization platform which has been developed in a Life Sciences context. A number of features, including semantic annotation via ontologies and an attention to provenance and evidence, make this an ideal candidate to consume Semantic Web information, as well as a prototype for the application of network analysis tools in this context. By analyzing the Ondex data structure and its usage, we have found a set of discrepancies and errors arising from the semantic mismatch between a procedural approach to network analysis and the implications of a web-based representation of information. We report in the paper on the simple methodology that we have adopted to conduct such analysis, and on issues that we have found which may be relevant for a range of similar platformsComment: Presented at DEIT, Data Engineering and Internet Technology, 2011 IEEE: CFP1113L-CD

    Analysis and visualisation of RDF resources in Ondex

    Get PDF
    An increasing number of biomedical resources provide their information on the Semantic Web and this creates the basis for a distributed knowledge base which has the potential to advance biomedical research [1]. This potential, however, cannot be realized until researchers from the life sciences can interact with information in the Semantic Web. In particular, there is a need for tools that provide data reduction, visualization and interactive analysis capabilities. Ondex is a data integration and visualization platform developed to support Systems Biology Research [2]. At its core is a data model based on two main principles: first, all information can be represented as a graph and, second, all elements of the graph can be annotated with ontologies. This data model conforms to the Semantic Web framework, in particular to RDF, and therefore Ondex is ideally positioned as a platform that can exploit the semantic web. The Ondex system offers a range of features and analysis methods of potential value to semantic web users, including: - An interactive graph visualization interface (Ondex user client), which provides data reduction and representation methods that leverage the ontological annotation. - A suite of importers from a variety of data sources to Ondex (http://ondex.org/formats.html) - A collection of plug-ins which implement graph analysis, graph transformation and graph-matching functions. - An integration toolkit (Ondex Integrator) which allows users to compose workflows from these modular components - In addition, all importers and plug-ins are available as web-services which can be integrated in other tools, as for instance Taverna [3]. The developments that will be presented in this demo have made this functionality interoperable with the Semantic Web framework. In particular we have developed an interactive importer, based on SPARQL that allows the query-driven construction of datasets which brings together information from different RDF data resources into Ondex. These datasets can then be further refined, analysed and annotated both interactively using the Ondex user client and via user-defined workflows. The results of these analyses can be exported in RDF, which can be used to enrich existent knowledge bases, or to provide application-specific views of the data. Both importer and exporter only focus on a subset of the Ondex and RDF data models, which are shared between these two data representations [4]. In this demo we will show how Ondex can be used to query, analyse and visualize Semantic Web knowledge bases. In particular we will present real use cases focused, but not limited to, resources relevant to plant biology. We believe that Ondex can be a valid contribution to the adoption of the Semantic Web in Systems Biology research and in biomedical investigation more generally. We welcome feedback on our current import/export prototype and suggestions for the advancement of Ondex for the Semantic Web. References 1. Ruttenberg, A. et. al.: Advancing translational research with the Semantic Web, BMC Bioinformatics, 8 (Suppl. 3): S2 (2007). 2. Köhler, J., Baumbach, J., Taubert, J., Specht, M., Skusa, A., Ruegg, A., Rawlings, C., Verrier, P., Philippi, S.: Graph-based analysis and visualization of experimental results with Ondex. Bioinformatics 22 (11):1383-1390 (2006). 3. Rawlings, C.: Semantic Data Integration for Systems Biology Research, Technology Track at ISMB’09, http://www.iscb.org/uploaded/css/36/11846.pdf (2009). 4. Splendiani, A. et. al.: Ondex semantic definition, (Web document) http://ondex.svn.sourceforge.net/viewvc/ondex/trunk/doc/semantics/ (2009)

    Systems approaches to drug repositioning

    Get PDF
    PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased biomedical knowledge and evolving approaches to Research and Development (R&D). One complementary approach to drug discovery is that of drug repositioning which focusses on identifying novel uses for existing drugs. By focussing on existing drugs that have already reached the market, drug repositioning has the potential to both reduce the timeframe and cost of getting a disease treatment to those that need it. Many marketed examples of repositioned drugs have been found via serendipitous or rational observations, highlighting the need for more systematic methodologies. Systems approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but require an integrative approach to biological data. Integrated networks can facilitate systems-level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person can identify portions of the graph that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated procedures are required to mine integrated networks systematically for these subgraphs and bring them to the attention of the user. The aim of this project was the development of novel computational methods to identify new therapeutic uses for existing drugs (with particular focus on active small molecules) using data integration. A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning Network Integration Framework (DReNInF) was developed as part of this work. This framework includes a high-level ontology, Drug Repositioning Network Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite of parsers; and a generic semantic graph integration platform. This framework enables the production of integrated networks maintaining strict semantics that are important in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description Framework (RDF) dataset. A Web-based front end was developed, which includes a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this dataset. To automate the mining of drug repositioning datasets, a formal framework for the definition of semantic subgraphs was established and a method for Drug Repositioning Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated. 9,643,061 putative D-T interactions were identified and ranked, with a strong correlation between highly scored associations and those supported by literature observed. The 20 top ranked associations were analysed in more detail with 14 found to be novel and six found to be supported by the literature. It was also shown that this approach better prioritises known D-T interactions, than other state-of-the-art methodologies. The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also investigated. As target-based approaches are utilised heavily in the field of drug discovery, it is necessary to have a systematic method to rank Gene-Disease (G-D) associations. Although methods already exist to collect, integrate and score these associations, these scores are often not a reliable re flection of expert knowledge. Therefore, an integrated data-driven approach to drug repositioning was developed using a Bayesian statistics approach and applied to rank 309,885 G-D associations using existing knowledge. Ranked associations were then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network it was shown that diseases of the central nervous system (CNS) provide an area of interest. The network was then systematically mined for semantic subgraphs that capture novel Dr-D relations. 275,934 Dr-D associations were identified and ranked, with those more likely to be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within the field of drug repositioning. DReNIn, for example, includes data that previous comparable datasets relevant to drug repositioning have neglected, such as clinical trial data and drug indications. Furthermore, the dataset may be easily extended using DReNInF to include future data as and when it becomes available, such as G-D association directionality (i.e. is the mutation a loss-of-function or gain-of-function). Unlike other algorithms and approaches developed for drug repositioning, DReSMin can be used to infer any types of associations captured in the target semantic network. Moreover, the approaches presented here should be more generically applicable to other fields that require algorithms for the integration and mining of semantically rich networks.European and Physical Sciences Research Council (EPSRC) and GS

    Customizable views on semantically integrated networks for systems biology

    Get PDF
    Motivation: The rise of high-throughput technologies in the post-genomic era has led to the production of large amounts of biological data. Many of these datasets are freely available on the Internet. Making optimal use of these data is a significant challenge for bioinformaticians. Various strategies for integrating data have been proposed to address this challenge. One of the most promising approaches is the development of semantically rich integrated datasets. Although well suited to computational manipulation, such integrated datasets are typically too large and complex for easy visualization and interactive exploration

    Bayesian integration of networks without gold standards

    Get PDF
    Motivation: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality ‘gold standard’ reference networks, but such reference networks are not always available

    Data integration strategies for informing computational design in synthetic biology

    Get PDF
    PhD ThesisThe potential design space for biological systems is complex, vast and multidimensional. Therefore, effective large-scale synthetic biology requires computational design and simulation. By constraining this design space, the time- and cost-efficient design of biological systems can be facilitated. One way in which a tractable design space can be achieved is to use the extensive and growing amount of biological data available to inform the design process. By using existing knowledge design efforts can be focused on biologically plausible areas of design space. However, biological data is large, incomplete, heterogeneous, and noisy. Data must be integrated in a systematic fashion in order to maximise its benefit. To date, data integration has not been widely applied to design in synthetic biology. The aim of this project is to apply data integration techniques to facilitate the efficient design of novel biological systems. The specific focus is on the development and application of integration techniques for the design of genetic regulatory networks in the model bacterium Bacillus subtilis. A dataset was constructed by integrating data from a range of sources in order to capture existing knowledge about B. subtilis 168. The dataset is represented as a computationally-accessible, semantically-rich network which includes information concerning biological entities and their relationships. Also included are sequence-based features mined from the B. subtilis genome, which are a useful source of parts for synthetic biology. In addition, information about the interactions of these parts has been captured, in order to facilitate the construction of circuits with desired behaviours. This dataset was also modelled in the form of an ontology, providing a formal specification of parts and their interactions. The ontology is a major step towards the unification of the data required for modelling with a range of part catalogues specifically designed for synthetic biology. The data from the ontology is available to existing reasoners for implicit knowledge extraction. The ontology was applied to the automated identification of promoters, operators and coding sequences. Information from the ontology was also used to generate dynamic models of parts. The work described here contributed to the development of a formalism called Standard Virtual Parts (SVPs), which aims to represent models of biological parts in a standardised manner. SVPs comprise a mapping between biological parts and modular computational models. A genetic circuit designed at a part-level abstraction can be investigated in detail by analysing a circuit model composed of SVPs. The ontology was used to construct SVPs in the form of standard Systems Biology Markup Language models. These models are publicly available from a computationally-accessible repository, and include metadata which facilitates the computational composition of SVPs in order to create models of larger biological systems. To test a genetic circuit in vitro or in vivo, the genetics elements necessary to encode the enitites in the in silico model, and their associated behaviour, must be derived. Ultimately, this process results in the specification for synthesisable DNA sequence. For large models, particularly those that are produced computationally, the transformation process is challenging. To automate this process, a model-to-sequence conversion algorithm was developed. The algorithm was implemented as a Java application called MoSeC. Using MoSeC, both CellML and SBML models built with SVPs can be converted into DNA sequences ready to synthesise. Selection of the host bacterial cell for a synthetic genetic circuit is very important. In order not to interfere with the existing cellular machinery, orthogonal parts from other species are used since these parts are less likely to have undesired interactions with the host. In order to find orthogonal transcription factors (OTFs), and their target binding sequences, a subset of the data from the integrated B. subtilis dataset was used. B. subtilis gene regulatory networks were used to re-construct regulatory networks in closely related Bacillus species. The system, called BacillusRegNet, stores both experimental data for B. subtilis and homology predictions in other species. BacillusRegNet was mined to extract OTFs and their binding sequences, in order to facilitate the engineering of novel regulatory networks in other Bacillus species. Although the techniques presented here were demonstrated using B. subtilis, they can be applied to any other organism. The approaches and tools developed as part of this project demonstrate the utility of this novel integrated approach to synthetic biology.EPSRC: NSF: The Newcastle University School of Computing Science

    Developing integrated crop knowledge networks to advance candidate gene discovery

    Get PDF
    AbstractThe chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement
    corecore