69 research outputs found
Analysis and visualisation of RDF resources in Ondex
An increasing number of biomedical resources provide their information on the Semantic Web and this creates the basis for a distributed knowledge base which has the potential to advance biomedical research [1]. This potential, however, cannot be realized until researchers from the life sciences can interact with information in the Semantic Web. In particular, there is a need for tools that provide data reduction, visualization and interactive analysis capabilities.
Ondex is a data integration and visualization platform developed to support Systems Biology Research [2]. At its core is a data model based on two main principles: first, all information can be represented as a graph and, second, all elements of the graph can be annotated with ontologies. This data model conforms to the Semantic Web framework, in particular to RDF, and therefore Ondex is ideally positioned as a platform that can exploit the semantic web. 
The Ondex system offers a range of features and analysis methods of potential value to semantic web users, including:
-	An interactive graph visualization interface (Ondex user client), which provides data reduction and representation methods that leverage the ontological annotation.
-	A suite of importers from a variety of data sources to Ondex (http://ondex.org/formats.html)
-	A collection of plug-ins which implement graph analysis, graph transformation and graph-matching functions.
-	An integration toolkit (Ondex Integrator) which allows users to compose workflows from these modular components
-	In addition, all importers and plug-ins are available as web-services which can be integrated in other tools, as for instance Taverna [3].
The developments that will be presented in this demo have made this functionality interoperable with the Semantic Web framework. In particular we have developed an interactive importer, based on SPARQL that allows the query-driven construction of datasets which brings together information from different RDF data resources into Ondex.
These datasets can then be further refined, analysed and annotated both interactively using the Ondex user client and via user-defined workflows. The results of these analyses can be exported in RDF, which can be used to enrich existent knowledge bases, or to provide application-specific views of the data. Both importer and exporter only focus on a subset of the Ondex and RDF data models, which are shared between these two data representations [4].
In this demo we will show how Ondex can be used to query, analyse and visualize Semantic Web knowledge bases. In particular we will present real use cases focused, but not limited to, resources relevant to plant biology. 
We believe that Ondex can be a valid contribution to the adoption of the Semantic Web in Systems Biology research and in biomedical investigation more generally. We welcome feedback on our current import/export prototype and suggestions for the advancement of Ondex for the Semantic Web.

References

1.	Ruttenberg, A. et. al.: Advancing translational research with the Semantic Web, BMC Bioinformatics, 8 (Suppl. 3): S2 (2007).
2.	Köhler, J., Baumbach, J., Taubert, J., Specht, M., Skusa, A., Ruegg, A., Rawlings, C., Verrier, P., Philippi, S.: Graph-based analysis and visualization of experimental results with Ondex. Bioinformatics 22 (11):1383-1390 (2006).
3.	Rawlings, C.: Semantic Data Integration for Systems Biology Research, Technology Track at ISMB’09, http://www.iscb.org/uploaded/css/36/11846.pdf (2009).
4.	Splendiani, A. et. al.: Ondex semantic definition, (Web document) http://ondex.svn.sourceforge.net/viewvc/ondex/trunk/doc/semantics/ (2009).

Integration strategies and data analysis methods for plant systems biology
Understanding how function relates to multiple layers of inactions between biological entities is one of the key goals of bioinformatics research, in particular in such areas as systems biology. However, the realisation of this objective is hampered by the sheer volume and multi-level heterogeneity of potentially relevant information. This work addressed this issue by developing a set of integration pipelines and analysis methods as part of an Ondex data integration framework. The integration process incorporated both relevant data from a set of publically available databases and information derived from predicted approaches, which were also implemented as part of this work.
These methods were used to assemble integrated datasets that were of relevance to the study of the model plant species Arabidopsis thaliana and applicable for the network-driven analysis. A particular attention was paid to the evaluation and comparison of the different sources of these data. Approaches were implemented for the identification and characterisation of functional modules in integrated networks and used to study and compare networks constructed from different types of data. The benefits of data integration were also demonstrated in three different bioinformatics research scenarios. The analysis of the constructed datasets has also resulted in a better understanding of the functional role of genes identified in a study of a nitrogen uptake mutant and allowed to select candidate genes for further exploration
Integration strategies and data analysis methods for plant systems biology
Understanding how function relates to multiple layers of inactions between biological entities is one of the key goals of bioinformatics research, in particular in such areas as systems biology. However, the realisation of this objective is hampered by the sheer volume and multi-level heterogeneity of potentially relevant information. This work addressed this issue by developing a set of integration pipelines and analysis methods as part of an Ondex data integration framework. The integration process incorporated both relevant data from a set of publically available databases and information derived from predicted approaches, which were also implemented as part of this work.
These methods were used to assemble integrated datasets that were of relevance to the study of the model plant species Arabidopsis thaliana and applicable for the network-driven analysis. A particular attention was paid to the evaluation and comparison of the different sources of these data. Approaches were implemented for the identification and characterisation of functional modules in integrated networks and used to study and compare networks constructed from different types of data. The benefits of data integration were also demonstrated in three different bioinformatics research scenarios. The analysis of the constructed datasets has also resulted in a better understanding of the functional role of genes identified in a study of a nitrogen uptake mutant and allowed to select candidate genes for further exploration
THE CRITERIA’S SET WITH INVARIANT DESIGN BUILDING ELEMENTS ON THE BASE OF THREE IMPUTATIONS: “CONVENIENCE”, “SAFETY” AND “ENERGY-EFFICIENCY”
The paper deals with the formalization of the criteria for constructing building management systems. We consider three criteria - “convenience”, “safety” and “energyefficiency”. For each objective proposed method of calculation
An integrative machine learning approach for prediction of toxicity - related drug safety
Recent trends in drug development have been marked by diminishing returns caused by the escalating costs and falling rates of new drug approval. Unacceptable drug toxicity is a substantial cause of drug failure during clinical trials and the leading cause of drug withdraws after release to the market. Computational methods capable of predicting these failures can reduce the waste of resources and time devoted to the investigation of compounds that ultimately fail. We propose an original machine learning method that leverages identity of drug targets and off-targets, functional impact score computed from Gene Ontology annotations, and biological network data to predict drug toxicity. We demonstrate that our method (TargeTox) can distinguish potentially idiosyncratically toxic drugs from safe drugs and is also suitable for speculative evaluation of different target sets to support the design of optimal low-toxicity combinations
THE CRITERIA’S SET WITH INVARIANT DESIGN BUILDING ELEMENTS ON THE BASE OF THREE IMPUTATIONS: “CONVENIENCE”, “SAFETY” AND “ENERGY-EFFICIENCY”
The paper deals with the formalization of the criteria for constructing building management systems. We consider three criteria - “convenience”, “safety” and “energyefficiency”. For each objective proposed method of calculation
HseSUMO: Sumoylation site prediction using half - sphere exposures of amino acids residues
Background
Post-translational modifications are viewed as an important mechanism for controlling protein function and are believed to be involved in multiple important diseases. However, their profiling using laboratory-based techniques remain challenging. Therefore, making the development of accurate computational methods to predict post-translational modifications is particularly important for making progress in this area of research.
Results
This work explores the use of four half-sphere exposure-based features for computational prediction of sumoylation sites. Unlike most of the previously proposed approaches, which focused on patterns of amino acid co-occurrence, we were able to demonstrate that protein structural based features could be sufficiently informative to achieve good predictive performance. The evaluation of our method has demonstrated high sensitivity (0.9), accuracy (0.89) and Matthew’s correlation coefficient (0.78–0.79). We have compared these results to the recently released pSumo-CD method and were able to demonstrate better performance of our method on the same evaluation dataset.
Conclusions
The proposed predictor HseSUMO uses half-sphere exposures of amino acids to predict sumoylation sites. It has shown promising results on a benchmark dataset when compared with the state-of-the-art method
Discovering study-specific gene regulatory networks
This article has been made available through the Brunel Open Access Publishing Fund.Microarrays are commonly used in biology because of their ability to simultaneously measure thousands of genes under different conditions. Due to their structure, typically containing a high amount of variables but far fewer samples, scalable network analysis techniques are often employed. In particular, consensus approaches have been recently used that combine multiple microarray studies in order to find networks that are more robust. The purpose of this paper, however, is to combine multiple microarray studies to automatically identify subnetworks that are distinctive to specific experimental conditions rather than common to them all. To better understand key regulatory mechanisms and how they change under different conditions, we derive unique networks from multiple independent networks built using glasso which goes beyond standard correlations. This involves calculating cluster prediction accuracies to detect the most predictive genes for a specific set of conditions. We differentiate between accuracies calculated using cross-validation within a selected cluster of studies (the intra prediction accuracy) and those calculated on a set of independent studies belonging to different study clusters (inter prediction accuracy). Finally, we compare our method's results to related state-of-the art techniques. We explore how the proposed pipeline performs on both synthetic data and real data (wheat and Fusarium). Our results show that subnetworks can be identified reliably that are specific to subsets of studies and that these networks reflect key mechanisms that are fundamental to the experimental conditions in each of those subsets
Genetical and comparative genomics of Brassica under altered Ca supply identifies Arabidopsis Ca-transporter orthologs
Although Ca transport in plants is highly complex, the overexpression of vacuolar Ca2+ transporters in crops is a promising new technology to improve dietary Ca supplies through biofortification. Here, we sought to identify novel targets for increasing plant Ca accumulation using genetical and comparative genomics. Expression quantitative trait locus (eQTL) mapping to 1895 cis- and 8015 trans-loci were identified in shoots of an inbred mapping population of Brassica rapa (IMB211 x R500); 23 cis- and 948 trans-eQTLs responded specifically to altered Ca supply. eQTLs were screened for functional significance using a large database of shoot Ca concentration phenotypes of Arabidopsis thaliana. From 31 Arabidopsis gene identifiers tagged to robust shoot Ca concentration phenotypes, 21 mapped to 27 B. rapa eQTLs, including orthologs of the Ca2+ transporters At-CAX1 and At-ACA8. Two of three independent missense mutants of BraA.cax1a, isolated previously by targeting induced local lesions in genomes, have allele-specific shoot Ca concentration phenotypes compared with their segregating wild types. BraA.CAX1a is a promising target for altering the Ca composition of Brassica, consistent with prior knowledge from Arabidopsis. We conclude that multiple-environment eQTL analysis of complex crop genomes combined with comparative genomics is a powerful technique for novel gene identification/prioritization
Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis
<p>Abstract</p> <p>Background</p> <p>Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems.</p> <p>Results</p> <p>We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in <it>Arabidopsis thaliana</it>. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters.</p> <p>Conclusions</p> <p>Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.</p
- …