Search CORE

487 research outputs found

Algorithms for effective querying of compound graph-based pathway databases

Author: A Funahashi
AH Bild
Ahmet Cetintas
AL Barabasi
BioPAX
D Croes
DJ Wong
E Demir
EM Reingold
Emek Demir
G Bader
HPJ Bonarius
JA Bondy
JA Engelman
K Fukuda
K Wang
KY Yip
L Matthews
M Baitaluk
N Yeung
O Babur
Ozgun Babur
Pathway Commons
R Caspi
R Gting
R Hofestädt
R Sharan
S Brohe
S Okuda
SBGN
T Aittokallio
T Shlomi
TH Cormen
The Cancer Genome Atlas Research Network
TS Keshava Prasad
U Dogrusoz
U Dogrusoz
U Leser
Ugur Dogrusoz
V Danos
VN Reddy
Y Tian
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools. Results Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool P<smcaps>ATIKA</smcaps><it>web </it>(Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases. Conclusion The P<smcaps>ATIKA</smcaps> Project Web site is <url>http://www.patika.org</url>. P<smcaps>ATIKA</smcaps><it>web </it>version 2.1 is available at <url>http://web.patika.org</url>.</p

Crossref

Bilkent University Institutional Repository

Directory of Open Access Journals

PubMed Central

Teak: A Novel Computational And Gui Software Pipeline For Reconstructing Biological Networks, Detecting Activated Biological Subnetworks, And Querying Biological Networks.

Author: Judeh Thair
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2014
Field of study

As high-throughput gene expression data becomes cheaper and cheaper, researchers are faced with a deluge of data from which biological insights need to be extracted and mined since the rate of data accumulation far exceeds the rate of data analysis. There is a need for computational frameworks to bridge the gap and assist researchers in their tasks. The Topology Enrichment Analysis frameworK (TEAK) is an open source GUI and software pipeline that seeks to be one of many tools that fills in this gap and consists of three major modules. The first module, the Gene Set Cultural Algorithm, de novo infers biological networks from gene sets using the KEGG pathways as prior knowledge. The second and third modules query against the KEGG pathways using molecular profiling data and query graphs, respectively. In particular, the second module, also called TEAK, is a network partitioning module that partitions the KEGG pathways into both linear and nonlinear subpathways. In conjunction with molecular profiling data, the subpathways are ranked and displayed to the user within the TEAK GUI. Using a public microarray yeast data set, previously unreported fitness defects for dpl1 delta and lag1 delta mutants under conditions of nitrogen limitation were found using TEAK. Finally, the third module, the Query Structure Enrichment Analysis framework, is a network query module that allows researchers to query their biological hypotheses in the form of Directed Acyclic Graphs against the KEGG pathways

Digital Commons@Wayne State University

APPAGATO: an APproximate PArallel and stochastic GrAph querying TOol for biological networks

Author: Bombieri Nicola
Bonnici Vincenzo
Busato Federico
Giugno Rosalba
Micale Giovanni
Pulvirenti Alfredo
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Motivation: Biological network querying is a problem requiring a considerable computational effort tobe solved. Given a target and a query network, it aims to find occurrences of the query in the target byconsidering topological and node similarities (i.e. mismatches between nodes, edges, or node labels).Querying tools that deal with similarities are crucial in biological network analysis since they providemeaningful results also in case of noisy data. In addition, since the size of available networks increasessteadily, existing algorithms and tools are becoming unsuitable. This is rising new challenges for the designof more efficient and accurate solutions.Results: This paper presents APPAGATO, a stochastic and parallel algorithm to find approximateoccurrences of a query network in biological networks. APPAGATO handles node, edge, and node labelmismatches. Thanks to its randomic and parallel nature, it applies to large networks and, compared toexisting tools, it provides higher performance as well as statistically significant more accurate results.Tests have been performed on protein-protein interaction networks annotated with synthetic and real geneontology terms. Case studies have been done by querying protein complexes among different species andtissue

Catalogo dei prodotti della ricerca

Metabolic Network Alignments and their Applications

Author: Cheng Qiong
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/12/2009
Field of study

The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze the accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks becomes computationally challenging. The dissertation addresses these challenges with discrete optimization and the corresponding algorithmic techniques. Based on the property of the gene duplication and function sharing in biological network,we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. We have proposed the first polynomial time algorithm for aligning an acyclic metabolic pattern pathway with an arbitrary metabolic network. We also have proposed a polynomial-time algorithm for patterns with small treewidth and implemented it for series-parallel patterns which are commonly found among metabolic networks. We have developed the metabolic network alignment tool for free public use. We have performed pairwise mapping of all pathways among five organisms and found a set of statistically significant pathway similarities. We also have applied the network alignment to identifying inconsistency, inferring missing enzymes, and finding potential candidates

ScholarWorks @ Georgia State University

Computational methods in cancer gene networking

Author: Edwin Wang
Publication venue
Publication date: 30/12/2008
Field of study

In the past few years, many high-throughput techniques have been developed and applied to biological studies. These techniques such as “next generation” genome sequencing, chip-on-chip, microarray and so on can be used to measure gene expression and gene regulatory elements in a genome-wide scale. Moreover, as these technologies become more affordable and accessible, they have become a driving force in modern biology. As a result, huge amount biological data have been produced, with the expectation of increasing number of such datasets to be generated in the future. High-throughput data are more comprehensive and unbiased, but ‘real signals’ or biological insights, molecular mechanisms and biological principles are buried in the flood of data. In current biological studies, the bottleneck is no longer a lack of data, but the lack of ingenuity and computational means to extract biological insights and principles by integrating knowledge and high-throughput data. 

Here I am reviewing the concepts and principles of network biology and the computational methods which can be applied to cancer research. Furthermore, I am providing a practical guide for computational analysis of cancer gene networks

NRC Publications Archive

Nature Precedings

Algorithms for effective querying of graph-based pathway databases

Author: Çetintaş Ahmet
Publication venue: Bilkent University
Publication date: 01/01/2007
Field of study

Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent Univ., 2007.Thesis (Master's) -- Bilkent University, 2007.Includes bibliographical references leaves 81-83.As the scientific curiosity shifts toward system-level investigation of genomicscale information, data produced about cellular processes at molecular level has been accumulating with an accelerating rate. Graph-based pathway ontologies and databases have been in wide use for such data. This representation has made it possible to programmatically integrate cellular networks as well as investigating them using the well-understood concepts of graph theory to predict their structural and dynamic properties. In this regard, it is essential to effectively query such integrated large networks to extract the sub-networks of interest with the help of efficient algorithms and software tools. Towards this goal, we have developed a querying framework along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, applicable to all sorts of graph-based pathway databases from PPIs to metabolic pathways to signaling pathways. These algorithms can also account for compound or nested structures present in the pathway data, and have been implemented within the querying components of Patika (Pathway Analysis Tools for Integration and Knowledge Acquisition) tools and have proven to be useful for answering a number of biologically significant queries for a large graph-based pathway database.Çetintaş, AhmetM.S

CiteSeerX

Bilkent University Institutional Repository

Causality analysis in biological networks

Author: Babur Özgün
Publication venue: Bilkent University
Publication date: 01/01/2010
Field of study

Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2010.Thesis (Ph.D.) -- Bilkent University, 2010.Includes bibliographical references leaves 69-78.Systems biology is a rapidly emerging field, shaped in the last two decades or so, which promises understanding and curing several complex diseases such as cancer. In order to get an insight about the system – specifically the molecular network in the cell – we need to work on following four fundamental aspects: experimental and computational methods to gather knowledge about the system, mathematical models for representing the knowledge, analysis methods for answering questions on the model, and software tools for working on these. In this thesis, we propose new approaches related to all these aspects. In this thesis, we define new terms and concepts that helps us to analyze cellular processes, such as positive and negative paths, upstream and downstream relations, and distance in process graphs. We propose algorithms that will search for functional relations between molecules and will answer several biologically interesting questions related to the network, such as neighborhoods, paths of interest, and common targets or regulators of molecules. In addition, we introduce ChiBE, a pathway editor for visualizing and analyzing BioPAX networks. The tool converts BioPAX graphs to drawable process diagrams and provides the mentioned novel analysis algorithms. Users can query pathways in Pathway Commons database and create sub-networks that focus on specific relations of interest. We also describe a microarray data analysis component, PATIKAmad, built into ChiBE and PATIKAweb, which integrates expression experiment data with networks. PATIKAmad helps those tools to represent experiment values on network elements and to search for causal relations in the network that potentially explain dependent expressions. Causative path search depends on the presence of transcriptional relations in the model, which however is underrepresented in most of the databases. This is mainly due to insufficient knowledge in the literature. We finally propose a method for identifying and classifying modulators of transcription factors, to help complete the missing transcriptional relations in the pathway databases. The method works with large amount of expression data, and looks for evidence of modulation for triplets of genes, i.e. modulator - factor - target. Modulator candidates are chosen among the interacting proteins of transcription factors. We expect to observe that expression of the target gene depends on the interaction between factor and modulator. According to the observed dependency type, we further classify the modulation. When tested, our method finds modulators of Androgen Receptor; our top-scoring result modulators are supported by other evidence in the literature. We also observe that the modulation event and modulation type highly depend on the specific target gene. This finding contradicts with expectations of molecular biology community who often assume a modulator has one type of effect regardless of the target gene.Babur, ÖzgünPh.D

Bilkent University Institutional Repository

A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem

Author: Tsourakakis Charalampos E.
Publication venue
Publication date: 20/05/2014
Field of study

Many graph mining applications rely on detecting subgraphs which are near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand the densest subgraph problem (DSP) which maximizes the average degree over all subgraphs is solvable in polynomial time but for many networks fails to find subgraphs which are near-cliques. On the other hand, formulations that are geared towards finding near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and finds near-cliques when the DSP fails. Surprisingly, our formulation is a simple variation of the DSP. Specifically, we define the triangle densest subgraph problem (TDSP): given

G(V,E)

, find a subset of vertices

S^*

such that

\tau(S^*)=\max_{S \subseteq V} \frac{t(S)}{|S|}

, where

t(S)

is the number of triangles induced by the set

S

. We provide various exact and approximation algorithms which the solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the

k

-clique average density. Finally, we provide empirical evidence that the TDSP should be used whenever the output of the DSP fails to output a near-clique.Comment: 42 page

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

What Google Maps can do for biomedical data dissemination: examples and a design study

Author: A Sinha
A Skupin
B Gretarsson
BB Bederson
BB Bederson
D Johnson
David H Laidlaw
E Demir
F Paulovich
F van Ham
G Aravindhan
GW Furnas
H Kuehn
J Heer
J Seo
K Arakawa
M Bostock
M Eisen
M Hegarty
M Meyer
N Elmqvist
N Henr
P Eades
P Shannon
R Jianu
R Jianu
R Jianu
R Jianu
Radu Jianu
S Berger
S Jul
S Saalfeld
Skupin A
T Munzner
T Yates
Z Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. RESULTS: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. CONCLUSIONS: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations

City Research Online

Crossref

Springer - Publisher Connector

PubMed Central

DigitalCommons@Florida International University