8 research outputs found

    Metabolic Network Alignments and their Applications

    Get PDF
    The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze the accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks becomes computationally challenging. The dissertation addresses these challenges with discrete optimization and the corresponding algorithmic techniques. Based on the property of the gene duplication and function sharing in biological network,we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. We have proposed the first polynomial time algorithm for aligning an acyclic metabolic pattern pathway with an arbitrary metabolic network. We also have proposed a polynomial-time algorithm for patterns with small treewidth and implemented it for series-parallel patterns which are commonly found among metabolic networks. We have developed the metabolic network alignment tool for free public use. We have performed pairwise mapping of all pathways among five organisms and found a set of statistically significant pathway similarities. We also have applied the network alignment to identifying inconsistency, inferring missing enzymes, and finding potential candidates

    SIMPLE AND FAST ALIGNMENT OF METABOLIC PATHWAYS BY EXPLOITING LOCAL DIVERSITY

    Full text link

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Sixth Biennial Report : August 2001 - May 2003

    No full text

    Latent Representation and Sampling in Network: Application in Text Mining and Biology.

    Get PDF
    In classical machine learning, hand-designed features are used for learning a mapping from raw data. However, human involvement in feature design makes the process expensive. Representation learning aims to learn abstract features directly from data without direct human involvement. Raw data can be of various forms. Network is one form of data that encodes relational structure in many real-world domains. Therefore, learning abstract features for network units is an important task. In this dissertation, we propose models for incorporating temporal information given as a collection of networks from subsequent time-stamps. The primary objective of our models is to learn a better abstract feature representation of nodes and edges in an evolving network. We show that the temporal information in the abstract feature improves the performance of link prediction task substantially. Besides applying to the network data, we also employ our models to incorporate extra-sentential information in the text domain for learning better representation of sentences. We build a context network of sentences to capture extra-sentential information. This information in abstract feature representation of sentences improves various text-mining tasks substantially over a set of baseline methods. A problem with the abstract features that we learn is that they lack interpretability. In real-life applications on network data, for some tasks, it is crucial to learn interpretable features in the form of graphical structures. For this we need to mine important graphical structures along with their frequency statistics from the input dataset. However, exact algorithms for these tasks are computationally expensive, so scalable algorithms are of urgent need. To overcome this challenge, we provide efficient sampling algorithms for mining higher-order structures from network(s). We show that our sampling-based algorithms are scalable. They are also superior to a set of baseline algorithms in terms of retrieving important graphical sub-structures, and collecting their frequency statistics. Finally, we show that we can use these frequent subgraph statistics and structures as features in various real-life applications. We show one application in biology and another in security. In both cases, we show that the structures and their statistics significantly improve the performance of knowledge discovery tasks in these domains

    Randomized and Deterministic Parameterized Algorithms and Their Applications in Bioinformatics

    Get PDF
    Parameterized NP-hard problems are NP-hard problems that are associated with special variables called parameters. One example of the problem is to find simple paths of length k in a graph, where the integer k is the parameter. We call this problem the p-path problem. The p-path problem is the parameterized version of the well-known NP-complete problem - the longest simple path problem. There are two main reasons why we study parameterized NP-hard problems. First, many application problems are naturally associated with certain parameters. Hence we need to solve these parameterized NP-hard problems. Second, if parameters take only small values, we can take advantage of these parameters to design very effective algorithms. If a parameterized NP-hard problem can be solved by an algorithm of running time in form of f(k)nO(1), where k is the parameter, f(k) is independent of n, and n is the input size of the problem instance, we say that this parameterized NP-hard problem is fixed parameter tractable (FPT). If a problem is FPT and the parameter takes only small values, the problem can be solved efficiently (it can be solved almost in polynomial time). In this dissertation, first, we introduce several techniques that can be used to design efficient algorithms for parameterized NP-hard problems. These techniques include branch and bound, divide and conquer, color coding and dynamic programming, iterative compression, iterative expansion and kernelization. Then we present our results about how to use these techniques to solve parameterized NP-hard problems, such as the p-path problem and the pd-feedback vertex set problem. Especially, we designed the first algorithm of running time in form of f(k)nO(1) for the pd-feedback vertex set problem. Thus solved an outstanding open problem, i.e. if the pd-feedback vertex set problem is FPT. Finally, we will introduce how to use parameterized algorithm techniques to solve the signaling pathway problem and the motif finding problem from bioinformatics
    corecore