1,191 research outputs found

    Algorithms for effective querying of compound graph-based pathway databases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools.</p> <p>Results</p> <p>Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool P<smcaps>ATIKA</smcaps><it>web </it>(Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases.</p> <p>Conclusion</p> <p>The P<smcaps>ATIKA</smcaps> Project Web site is <url>http://www.patika.org</url>. P<smcaps>ATIKA</smcaps><it>web </it>version 2.1 is available at <url>http://web.patika.org</url>.</p

    Discovering Complex Relationships between Drugs and Diseases

    Get PDF
    Finding the complex semantic relations between existing drugs and new diseases will help in the drug development in a new way. Most of the drugs which have found new uses have been discovered due to serendipity. Hence, the prediction of the uses of drugs for more than one disease should be done in a systematic way by studying the semantic relations between the drugs and diseases and also the other entities involved in the relations. Hence, in order to study the complex semantic relations between drugs and diseases an application was developed that integrates the heterogeneous data in different formats from different public databases which are available online. A high level ontology was also developed to integrate the data and only the fields required for the current study were used. The data was collected from different data sources such as DrugBank, UniProt/SwissProt, GeneCards and OMIM. Most of these data sources are the standard data sources and have been used by National Committee of Biotechnology Information of Nation Institute of Health. The data was parsed and integrated and complex semantic relations were discovered. This is a simple and novel effort which may find uses in development of new drug targets and polypharmacology

    Pathway Commons, a web resource for biological pathway data

    Get PDF
    Pathway Commons (http://www.pathwaycommons.org) is a collection of publicly available pathway data from multiple organisms. Pathway Commons provides a web-based interface that enables biologists to browse and search a comprehensive collection of pathways from multiple sources represented in a common language, a download site that provides integrated bulk sets of pathway information in standard or convenient formats and a web service that software developers can use to conveniently query and access all data. Database providers can share their pathway data via a common repository. Pathways include biochemical reactions, complex assembly, transport and catalysis events and physical interactions involving proteins, DNA, RNA, small molecules and complexes. Pathway Commons aims to collect and integrate all public pathway data available in standard formats. Pathway Commons currently contains data from nine databases with over 1400 pathways and 687 000 interactions and will be continually expanded and updated

    Enabling Web-scale data integration in biomedicine through Linked Open Data

    Get PDF
    The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems

    Systems approaches to drug repositioning

    Get PDF
    PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased biomedical knowledge and evolving approaches to Research and Development (R&D). One complementary approach to drug discovery is that of drug repositioning which focusses on identifying novel uses for existing drugs. By focussing on existing drugs that have already reached the market, drug repositioning has the potential to both reduce the timeframe and cost of getting a disease treatment to those that need it. Many marketed examples of repositioned drugs have been found via serendipitous or rational observations, highlighting the need for more systematic methodologies. Systems approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but require an integrative approach to biological data. Integrated networks can facilitate systems-level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person can identify portions of the graph that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated procedures are required to mine integrated networks systematically for these subgraphs and bring them to the attention of the user. The aim of this project was the development of novel computational methods to identify new therapeutic uses for existing drugs (with particular focus on active small molecules) using data integration. A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning Network Integration Framework (DReNInF) was developed as part of this work. This framework includes a high-level ontology, Drug Repositioning Network Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite of parsers; and a generic semantic graph integration platform. This framework enables the production of integrated networks maintaining strict semantics that are important in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description Framework (RDF) dataset. A Web-based front end was developed, which includes a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this dataset. To automate the mining of drug repositioning datasets, a formal framework for the definition of semantic subgraphs was established and a method for Drug Repositioning Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated. 9,643,061 putative D-T interactions were identified and ranked, with a strong correlation between highly scored associations and those supported by literature observed. The 20 top ranked associations were analysed in more detail with 14 found to be novel and six found to be supported by the literature. It was also shown that this approach better prioritises known D-T interactions, than other state-of-the-art methodologies. The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also investigated. As target-based approaches are utilised heavily in the field of drug discovery, it is necessary to have a systematic method to rank Gene-Disease (G-D) associations. Although methods already exist to collect, integrate and score these associations, these scores are often not a reliable re flection of expert knowledge. Therefore, an integrated data-driven approach to drug repositioning was developed using a Bayesian statistics approach and applied to rank 309,885 G-D associations using existing knowledge. Ranked associations were then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network it was shown that diseases of the central nervous system (CNS) provide an area of interest. The network was then systematically mined for semantic subgraphs that capture novel Dr-D relations. 275,934 Dr-D associations were identified and ranked, with those more likely to be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within the field of drug repositioning. DReNIn, for example, includes data that previous comparable datasets relevant to drug repositioning have neglected, such as clinical trial data and drug indications. Furthermore, the dataset may be easily extended using DReNInF to include future data as and when it becomes available, such as G-D association directionality (i.e. is the mutation a loss-of-function or gain-of-function). Unlike other algorithms and approaches developed for drug repositioning, DReSMin can be used to infer any types of associations captured in the target semantic network. Moreover, the approaches presented here should be more generically applicable to other fields that require algorithms for the integration and mining of semantically rich networks.European and Physical Sciences Research Council (EPSRC) and GS

    Teak: A Novel Computational And Gui Software Pipeline For Reconstructing Biological Networks, Detecting Activated Biological Subnetworks, And Querying Biological Networks.

    Get PDF
    As high-throughput gene expression data becomes cheaper and cheaper, researchers are faced with a deluge of data from which biological insights need to be extracted and mined since the rate of data accumulation far exceeds the rate of data analysis. There is a need for computational frameworks to bridge the gap and assist researchers in their tasks. The Topology Enrichment Analysis frameworK (TEAK) is an open source GUI and software pipeline that seeks to be one of many tools that fills in this gap and consists of three major modules. The first module, the Gene Set Cultural Algorithm, de novo infers biological networks from gene sets using the KEGG pathways as prior knowledge. The second and third modules query against the KEGG pathways using molecular profiling data and query graphs, respectively. In particular, the second module, also called TEAK, is a network partitioning module that partitions the KEGG pathways into both linear and nonlinear subpathways. In conjunction with molecular profiling data, the subpathways are ranked and displayed to the user within the TEAK GUI. Using a public microarray yeast data set, previously unreported fitness defects for dpl1 delta and lag1 delta mutants under conditions of nitrogen limitation were found using TEAK. Finally, the third module, the Query Structure Enrichment Analysis framework, is a network query module that allows researchers to query their biological hypotheses in the form of Directed Acyclic Graphs against the KEGG pathways

    Semantic inference using chemogenomics data for drug discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Semantic Web Technology (SWT) makes it possible to integrate and search the large volume of life science datasets in the public domain, as demonstrated by well-known linked data projects such as LODD, Bio2RDF, and Chem2Bio2RDF. Integration of these sets creates large networks of information. We have previously described a tool called WENDI for aggregating information pertaining to new chemical compounds, effectively creating evidence paths relating the compounds to genes, diseases and so on. In this paper we examine the utility of automatically inferring new compound-disease associations (and thus new links in the network) based on semantically marked-up versions of these evidence paths, rule-sets and inference engines.</p> <p>Results</p> <p>Through the implementation of a semantic inference algorithm, rule set, Semantic Web methods (RDF, OWL and SPARQL) and new interfaces, we have created a new tool called Chemogenomic Explorer that uses networks of ontologically annotated RDF statements along with deductive reasoning tools to infer new associations between the query structure and genes and diseases from WENDI results. The tool then permits interactive clustering and filtering of these evidence paths.</p> <p>Conclusions</p> <p>We present a new aggregate approach to inferring links between chemical compounds and diseases using semantic inference. This approach allows multiple evidence paths between compounds and diseases to be identified using a rule-set and semantically annotated data, and for these evidence paths to be clustered to show overall evidence linking the compound to a disease. We believe this is a powerful approach, because it allows compound-disease relationships to be ranked by the amount of evidence supporting them.</p

    Causality analysis in biological networks

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2010.Thesis (Ph.D.) -- Bilkent University, 2010.Includes bibliographical references leaves 69-78.Systems biology is a rapidly emerging field, shaped in the last two decades or so, which promises understanding and curing several complex diseases such as cancer. In order to get an insight about the system – specifically the molecular network in the cell – we need to work on following four fundamental aspects: experimental and computational methods to gather knowledge about the system, mathematical models for representing the knowledge, analysis methods for answering questions on the model, and software tools for working on these. In this thesis, we propose new approaches related to all these aspects. In this thesis, we define new terms and concepts that helps us to analyze cellular processes, such as positive and negative paths, upstream and downstream relations, and distance in process graphs. We propose algorithms that will search for functional relations between molecules and will answer several biologically interesting questions related to the network, such as neighborhoods, paths of interest, and common targets or regulators of molecules. In addition, we introduce ChiBE, a pathway editor for visualizing and analyzing BioPAX networks. The tool converts BioPAX graphs to drawable process diagrams and provides the mentioned novel analysis algorithms. Users can query pathways in Pathway Commons database and create sub-networks that focus on specific relations of interest. We also describe a microarray data analysis component, PATIKAmad, built into ChiBE and PATIKAweb, which integrates expression experiment data with networks. PATIKAmad helps those tools to represent experiment values on network elements and to search for causal relations in the network that potentially explain dependent expressions. Causative path search depends on the presence of transcriptional relations in the model, which however is underrepresented in most of the databases. This is mainly due to insufficient knowledge in the literature. We finally propose a method for identifying and classifying modulators of transcription factors, to help complete the missing transcriptional relations in the pathway databases. The method works with large amount of expression data, and looks for evidence of modulation for triplets of genes, i.e. modulator - factor - target. Modulator candidates are chosen among the interacting proteins of transcription factors. We expect to observe that expression of the target gene depends on the interaction between factor and modulator. According to the observed dependency type, we further classify the modulation. When tested, our method finds modulators of Androgen Receptor; our top-scoring result modulators are supported by other evidence in the literature. We also observe that the modulation event and modulation type highly depend on the specific target gene. This finding contradicts with expectations of molecular biology community who often assume a modulator has one type of effect regardless of the target gene.Babur, ÖzgünPh.D

    Algorithms for effective querying of graph-based pathway databases

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent Univ., 2007.Thesis (Master's) -- Bilkent University, 2007.Includes bibliographical references leaves 81-83.As the scientific curiosity shifts toward system-level investigation of genomicscale information, data produced about cellular processes at molecular level has been accumulating with an accelerating rate. Graph-based pathway ontologies and databases have been in wide use for such data. This representation has made it possible to programmatically integrate cellular networks as well as investigating them using the well-understood concepts of graph theory to predict their structural and dynamic properties. In this regard, it is essential to effectively query such integrated large networks to extract the sub-networks of interest with the help of efficient algorithms and software tools. Towards this goal, we have developed a querying framework along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, applicable to all sorts of graph-based pathway databases from PPIs to metabolic pathways to signaling pathways. These algorithms can also account for compound or nested structures present in the pathway data, and have been implemented within the querying components of Patika (Pathway Analysis Tools for Integration and Knowledge Acquisition) tools and have proven to be useful for answering a number of biologically significant queries for a large graph-based pathway database.Çetintaş, AhmetM.S
    corecore