487 research outputs found
Algorithms for effective querying of compound graph-based pathway databases
<p>Abstract</p> <p>Background</p> <p>Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools.</p> <p>Results</p> <p>Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool P<smcaps>ATIKA</smcaps><it>web </it>(Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases.</p> <p>Conclusion</p> <p>The P<smcaps>ATIKA</smcaps> Project Web site is <url>http://www.patika.org</url>. P<smcaps>ATIKA</smcaps><it>web </it>version 2.1 is available at <url>http://web.patika.org</url>.</p
Teak: A Novel Computational And Gui Software Pipeline For Reconstructing Biological Networks, Detecting Activated Biological Subnetworks, And Querying Biological Networks.
As high-throughput gene expression data becomes cheaper and cheaper, researchers are faced with a deluge of data from which biological insights need to be extracted and mined since the rate of data accumulation far exceeds the rate of data analysis. There is a need for computational frameworks to bridge the gap and assist researchers in their tasks. The Topology Enrichment Analysis frameworK (TEAK) is an open source GUI and software pipeline that seeks to be one of many tools that fills in this gap and consists of three major modules. The first module, the Gene Set Cultural Algorithm, de novo infers biological networks from gene sets using the KEGG pathways as prior knowledge. The second and third modules query against the KEGG pathways using molecular profiling data and query graphs, respectively. In particular, the second module, also called TEAK, is a network partitioning module that partitions the KEGG pathways into both linear and nonlinear subpathways. In conjunction with molecular profiling data, the subpathways are ranked and displayed to the user within the TEAK GUI. Using a public microarray yeast data set, previously unreported fitness defects for dpl1 delta and lag1 delta mutants under conditions of nitrogen limitation were found using TEAK. Finally, the third module, the Query Structure Enrichment Analysis framework, is a network query module that allows researchers to query their biological hypotheses in the form of Directed Acyclic Graphs against the KEGG pathways
APPAGATO: an APproximate PArallel and stochastic GrAph querying TOol for biological networks
Motivation: Biological network querying is a problem requiring a considerable computational effort tobe solved. Given a target and a query network, it aims to find occurrences of the query in the target byconsidering topological and node similarities (i.e. mismatches between nodes, edges, or node labels).Querying tools that deal with similarities are crucial in biological network analysis since they providemeaningful results also in case of noisy data. In addition, since the size of available networks increasessteadily, existing algorithms and tools are becoming unsuitable. This is rising new challenges for the designof more efficient and accurate solutions.Results: This paper presents APPAGATO, a stochastic and parallel algorithm to find approximateoccurrences of a query network in biological networks. APPAGATO handles node, edge, and node labelmismatches. Thanks to its randomic and parallel nature, it applies to large networks and, compared toexisting tools, it provides higher performance as well as statistically significant more accurate results.Tests have been performed on protein-protein interaction networks annotated with synthetic and real geneontology terms. Case studies have been done by querying protein complexes among different species andtissue
Metabolic Network Alignments and their Applications
The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze the accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks becomes computationally challenging. The dissertation addresses these challenges with discrete optimization and the corresponding algorithmic techniques. Based on the property of the gene duplication and function sharing in biological network,we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. We have proposed the first polynomial time algorithm for aligning an acyclic metabolic pattern pathway with an arbitrary metabolic network. We also have proposed a polynomial-time algorithm for patterns with small treewidth and implemented it for series-parallel patterns which are commonly found among metabolic networks. We have developed the metabolic network alignment tool for free public use. We have performed pairwise mapping of all pathways among five organisms and found a set of statistically significant pathway similarities. We also have applied the network alignment to identifying inconsistency, inferring missing enzymes, and finding potential candidates
Computational methods in cancer gene networking
In the past few years, many high-throughput techniques have been developed and applied to biological studies. These techniques such as “next generation” genome sequencing, chip-on-chip, microarray and so on can be used to measure gene expression and gene regulatory elements in a genome-wide scale. Moreover, as these technologies become more affordable and accessible, they have become a driving force in modern biology. As a result, huge amount biological data have been produced, with the expectation of increasing number of such datasets to be generated in the future. High-throughput data are more comprehensive and unbiased, but ‘real signals’ or biological insights, molecular mechanisms and biological principles are buried in the flood of data. In current biological studies, the bottleneck is no longer a lack of data, but the lack of ingenuity and computational means to extract biological insights and principles by integrating knowledge and high-throughput data. 

Here I am reviewing the concepts and principles of network biology and the computational methods which can be applied to cancer research. Furthermore, I am providing a practical guide for computational analysis of cancer gene networks
Algorithms for effective querying of graph-based pathway databases
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent Univ., 2007.Thesis (Master's) -- Bilkent University, 2007.Includes bibliographical references leaves 81-83.As the scientific curiosity shifts toward system-level investigation of genomicscale
information, data produced about cellular processes at molecular level has
been accumulating with an accelerating rate. Graph-based pathway ontologies
and databases have been in wide use for such data. This representation has made
it possible to programmatically integrate cellular networks as well as investigating
them using the well-understood concepts of graph theory to predict their structural
and dynamic properties. In this regard, it is essential to effectively query
such integrated large networks to extract the sub-networks of interest with the
help of efficient algorithms and software tools.
Towards this goal, we have developed a querying framework along with a number
of graph-theoretic algorithms from simple neighborhood queries to shortest
paths to feedback loops, applicable to all sorts of graph-based pathway databases
from PPIs to metabolic pathways to signaling pathways. These algorithms can
also account for compound or nested structures present in the pathway data, and
have been implemented within the querying components of Patika (Pathway
Analysis Tools for Integration and Knowledge Acquisition) tools and have proven
to be useful for answering a number of biologically significant queries for a large
graph-based pathway database.Çetintaş, AhmetM.S
Causality analysis in biological networks
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2010.Thesis (Ph.D.) -- Bilkent University, 2010.Includes bibliographical references leaves 69-78.Systems biology is a rapidly emerging field, shaped in the last two decades
or so, which promises understanding and curing several complex diseases such as
cancer. In order to get an insight about the system – specifically the molecular
network in the cell – we need to work on following four fundamental aspects:
experimental and computational methods to gather knowledge about the system,
mathematical models for representing the knowledge, analysis methods for answering
questions on the model, and software tools for working on these. In this
thesis, we propose new approaches related to all these aspects.
In this thesis, we define new terms and concepts that helps us to analyze
cellular processes, such as positive and negative paths, upstream and downstream
relations, and distance in process graphs. We propose algorithms that will search
for functional relations between molecules and will answer several biologically
interesting questions related to the network, such as neighborhoods, paths of
interest, and common targets or regulators of molecules.
In addition, we introduce ChiBE, a pathway editor for visualizing and analyzing
BioPAX networks. The tool converts BioPAX graphs to drawable process
diagrams and provides the mentioned novel analysis algorithms. Users can query
pathways in Pathway Commons database and create sub-networks that focus on
specific relations of interest.
We also describe a microarray data analysis component, PATIKAmad, built
into ChiBE and PATIKAweb, which integrates expression experiment data with
networks. PATIKAmad helps those tools to represent experiment values on network
elements and to search for causal relations in the network that potentially
explain dependent expressions. Causative path search depends on the presence of
transcriptional relations in the model, which however is underrepresented in most
of the databases. This is mainly due to insufficient knowledge in the literature.
We finally propose a method for identifying and classifying modulators of
transcription factors, to help complete the missing transcriptional relations in
the pathway databases. The method works with large amount of expression
data, and looks for evidence of modulation for triplets of genes, i.e. modulator -
factor - target. Modulator candidates are chosen among the interacting proteins
of transcription factors. We expect to observe that expression of the target gene
depends on the interaction between factor and modulator. According to the observed
dependency type, we further classify the modulation. When tested, our
method finds modulators of Androgen Receptor; our top-scoring result modulators
are supported by other evidence in the literature. We also observe that the
modulation event and modulation type highly depend on the specific target gene.
This finding contradicts with expectations of molecular biology community who
often assume a modulator has one type of effect regardless of the target gene.Babur, ÖzgünPh.D
A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem
Many graph mining applications rely on detecting subgraphs which are
near-cliques. There exists a dichotomy between the results in the existing work
related to this problem: on the one hand the densest subgraph problem (DSP)
which maximizes the average degree over all subgraphs is solvable in polynomial
time but for many networks fails to find subgraphs which are near-cliques. On
the other hand, formulations that are geared towards finding near-cliques are
NP-hard and frequently inapproximable due to connections with the Maximum
Clique problem.
In this work, we propose a formulation which combines the best of both
worlds: it is solvable in polynomial time and finds near-cliques when the DSP
fails. Surprisingly, our formulation is a simple variation of the DSP.
Specifically, we define the triangle densest subgraph problem (TDSP): given
, find a subset of vertices such that , where is the number of triangles induced
by the set . We provide various exact and approximation algorithms which the
solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to
the more general problem of maximizing the -clique average density. Finally,
we provide empirical evidence that the TDSP should be used whenever the output
of the DSP fails to output a near-clique.Comment: 42 page
Recommended from our members
What Google Maps can do for biomedical data dissemination: examples and a design study
BACKGROUND: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data.
RESULTS: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers.
CONCLUSIONS: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations
- …