9 research outputs found

    An algorithm for reporting maximal c-cliques

    Get PDF
    Given two graphs, a fundamental task faced by matching algorithms consists of computing either the (Connected) Maximal Common Induced Subgraphs ((C)MCIS) or the (Connected) Maximal Common Edge Subgraphs ((C)MCES). In particular, computing the CMCIS or CMCES reduces to reporting so-called connected cliques in product graphs, a problem for which an algorithm has been presented in a recent paper I. Koch, TCS 250 (1-2), 2001. This algorithm suffers from two problems which are corrected in this note

    Evolutionary Centrality and Maximal Cliques in Mobile Social Networks

    Full text link
    This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes

    Probing a Continuum of Macro-molecular Assembly Models with Graph Templates of Complexes

    Get PDF
    Reconstruction by data integration is an emerging trend to reconstruct large protein assemblies, but uncertainties on the input data yield average models whose quantitative interpretation is challenging. This paper presents methods to probe fuzzy models of large assemblies against atomic resolution models of sub-systems. More precisely, consider a Toleranced Model (TOM) of a macro-molecular assembly, namely a continuum of nested shapes representing the assembly at multiple scales. Also consider a template namely an atomic resolution 3D model of a sub-system (a complex) of this assembly. We present graph-based algorithms performing a multi-scale assessment of the complexes of the TOM, by comparing the pairwise contacts which appear in the TOM against those of the template. We apply this machinery to recent average models of the Nuclear Pore Complex, and confront our observations to the latest experimental work.La reconstruction par intégration de données est une modalité émergente pour reconstruire de gros assemblages macro-moléculaires, mais les incertitudes sur les entrées donnent lieu à la génération de modèles moyens dont l'interprétation quantitative est délicate. Ce travail présente des méthodes pour comparer de tels modèles moyens à des structures de sous-systèmes connus à résolution atomique. Plus précisément, considérons un modèle tolérancé (TOM) d'un assemblage, i.e. un continuum de formes imbriquées représentant l'assemblage à diverses échelles. Considérons également un {\em template}, i.e. un modèle à résolution atomique d'un sous-système. Nous présentons des outils dérivés de la théorie des graphes, permettant de comparer les contacts entre les protéines du TOM aux contacts du template. Nous utilisons ces outils pour analyser des modèles moyens du pore nucléaire récemment produits, et discutons nos résultats à la lumière des données expérimentales les plus récentes

    Network science algorithms for mobile networks.

    Get PDF
    Network Science is one of the important and emerging fields in computer science and engineering that focuses on the study and analysis of different types of networks. The goal of this dissertation is to design and develop network science algorithms that can be used to study and analyze mobile networks. This can provide essential information and knowledge that can help mobile networks service providers to enhance the quality of the mobile services. We focus in this dissertation on the design and analysis of different network science techniques that can be used to analyze the dynamics of mobile networks. These techniques include evolutionary clustering, classification, discovery of maximal cliques, and evolutionary centrality algorithms. We proposed evolutionary clustering and evolutionary centrality algorithms that can be used to dynamically discover clusters and central nodes in mobile networks. Overall, the experimental results show that the proposed evolutionary algorithms are robust to short-term variations but reflects long-term trends and can be used effectively to analyze the dynamics of mobile networks

    Assessing the Reconstruction of Macro-molecular Assemblies: the Example of the Nuclear Pore Complex

    Get PDF
    The reconstruction of large protein assemblies is a major challenge due to their plasticity and due to the flexibility of the proteins involved. An emerging trend to cope with these uncertainties consists of performing the reconstruction by integrating experimental data from several sources, a strategy recently used to propose qualitative reconstructions of the Nuclear Pore Complex. Yet, the absence of clearly identified canonical reconstructions and the lack of quantitative assessment with respect to the experimental data are detrimental to the mechanistic exploitation of the results. To leverage such reconstructions, this work proposes a modeling framework inherently accommodating uncertainties, and allowing a precise assessment of the reconstructed models. We make three contributions. First, we introduce {\em toleranced models} to accommodate the positional and conformational uncertainties of protein instances within large assemblies. A toleranced model is a continuum of geometries whose distinct topologies can be enumerated, and mining stable complexes amidst this finite set hints at important structures in the assembly. Second, we present a panoply of tools to perform a multi-scale topological, geometric, and biochemical assessment of the complexes associated to a toleranced model, at the assembly and local levels. At the assembly level, we assess the prominence of contacts and the quality of the reconstruction, in particular w.r.t symmetries. At the local level, the complexes encountered in the toleranced model are used to confirm / question / suggest protein contacts within a known 3D template known at atomic resolution. Third, we apply our machinery to the NPC for which we (i) report prominent contacts uncovering sub-complexes of the NPC, (ii) explain the closure of the two rings involving 16 copies of the YY-complex, and (iii) develop a new 3D template for the TT-complex. These contributions should prove instrumental in enhancing the reconstruction of assemblies, and in selecting the models which best comply with experimental data

    Tree comparison: enumeration and application to cheminformatics

    Get PDF
    Graphs are a well-known data structure used in many application domains that rely on relationships between individual entities. Examples are social networks, where the users may be in friendship with each other, road networks, where one-way or bidirectional roads connect crossings, and work package assignments, where workers are assigned to tasks. In chem- and bioinformatics, molecules are often represented as molecular graphs, where vertices represent atoms, and bonds between them are represented by edges connecting the vertices. Since there is an ever-increasing amount of data that can be treated as graphs, fast algorithms are needed to compare such graphs. A well-researched concept to compare two graphs is the maximum common subgraph. On the one hand, this allows finding substructures that are common to both input graphs. On the other hand, we can derive a similarity score from the maximum common subgraph. A practical application is rational drug design which involves molecular similarity searches. In this thesis, we study the maximum common subgraph problem, which entails finding a largest graph, which is isomorphic to subgraphs of two input graphs. We focus on restrictions that allow polynomial-time algorithms with a low exponent. An example is the maximum common subtree of two input trees. We succeed in improving the previously best-known time bound. Additionally, we provide a lower time bound under certain assumptions. We study a generalization of the maximum common subtree problem, the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. This problem is motivated by the application to cheminformatics. First, the vast majority of drugs modeled as molecular graphs is outerplanar, and second, the blocks correspond to the ring structures and the bridges to atom chains or linkers. If we allow disconnected common subgraphs, the problem becomes NP-hard even for trees as input. We propose a second generalization of the maximum common subtree problem, which allows skipping vertices in the input trees while maintaining polynomial running time. Since a maximum common subgraph is not unique in general, we investigate the problem to enumerate all maximum solutions. We do this for both the maximum common subtree problem and the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. An arising subproblem which we analyze is the enumeration of maximum weight matchings in bipartite graphs. We support a weight function between the vertices and edges for all proposed common subgraph methods in this thesis. Thus the objective is to compute a common subgraph of maximum weight. The weights may be integral or real-valued, including negative values. A special case of using such a weight function is computing common subgraph isomorphisms between labeled graphs, where labels between mapped vertices and edges must be equal. An experimental study evaluates the practical running times and the usefulness of our block-and-bridge preserving maximum common induced subgraph algorithm against state of the art algorithms

    A Graph Theoretic Approach To Food Combination Problems

    Full text link
    Graph theory provides a useful representation of, and mathematical toolkit for, analyzing how things are connected together. This collection of research investigates the use of graph theory as a representation of how foods are connected together. The first two studies validate the subject questioning procedure used to create a graph model out of responses and the final study introduces a new approach to using this methodology to optimize field ration menus for the United States Army. In the first study, we began by asking subjects whether or not pairs of ingredients would be appropriate to combine on a salad. Next, using graph theoretic methods, we predicted which combinations of 3-8 components should go together. Subjects were then asked whether or not particular combinations were appropriate to combine on a salad. A paired Wilcoxon test between the predicted and non-predicted combinations was significant for all combination sizes. The second study tested the principle of supercombinatorality, i.e. that food combinations (of more than two items) that are fully compatible on a pairwise basis are more compatible than combinations that are not fully compatible pairwise. This study extended the previous findings to group data. Purchase intent responses to pairs of different pizza toppings were collected and used to predict pizzas (with one to 6 toppings) that would appeal to the entire group. Results showed purchase interest to be higher for the predicted pizzas than for non pre- dicted pizzas supporting the supercombinatorality principle. The final study extends the graph theory representation to military rations known as Meal-Ready-to-Eat or MREs. MRE menus are composed of 11 different food categories (entr´ e, side, snack, etc.) and there are multiple items e available in each category. From these items there are over 22 billion potential menus. Categories and items were screened to create a list of the most important ones and we asked soldiers whether or not pairwise combinations of components were appropriate to combine in a meal. Using graph theoretic tools, predictions were made of optimal MRE menus and rankings were attached to prediction in order to assist the product developers in screening old and new menu concepts

    Computational methods for small molecules

    Get PDF
    Metabolism is the system of chemical reactions sustaining life in the cells of living organisms. It is responsible for cellular processes that break down nutrients for energy and produce building blocks for necessary molecules. The study of metabolism is vital to many disciplines in medicine and pharmacy. Chemical reactions operate on small molecules called metabolites, which form the core of metabolism. In this thesis we propose efficient computational methods for small molecules in metabolic applications. In this thesis we discuss four distinctive studies covering two major themes: the atom-level description of biochemical reactions, and analysis of tandem mass spectrometric measurements of metabolites. In the first part we study atom-level descriptions of organic reactions. We begin by proposing an optimal algorithm for determining the atom-to-atom correspondences between the reactant and product metabolites of organic reactions. In addition, we introduce a graph edit distance based cost as the mathematical formalism to determine optimality of atom mappings. We continue by proposing a compact single-graph representation of reactions using the atom mappings. We investigate the utility of the new representation in a reaction function classification task, where a descriptive category of the reaction's function is predicted. To facilitate the prediction, we introduce the first feasible path-based graph kernel, which describes the reactions as path sequences to high classification accuracy. In the second part we turn our focus on analysing tandem mass spectrometric measurements of metabolites. In a tandem mass spectrometer, an input molecule structure is fragmented into substructures or fragments, whose masses are observed. We begin by studying the fragment identification problem. A combinatorial algorithm is presented to enumerate candidate substructures based on the given masses. We also demonstrate the usefulness of utilising approximated bond energies as a cost function to rank the candidate structures according to their chemical feasibility. We propose fragmentation tree models to describe the dependencies between fragments for higher identification accuracy. We continue by studying a closely related problem where an unknown metabolite is elucidated based on its tandem mass spectrometric fragment signals. This metabolite identification task is an important problem in metabolomics, underpinning the subsequent modelling and analysis efforts. We propose an automatic machine learning framework to predict a set of structural properties of the unknown metabolite. The properties are turned into candidate structures by a novel statistical model. We introduce the first mass spectral kernels and explore three feature classes to facilitate the prediction. The kernels introduce support for high-accuracy mass spectrometric measurements for enhanced predictive accuracy.Tässä väitöskirjassa esitetään tehokkaita laskennallisia menetelmiä pienille molekyyleille aineenvaihduntasovelluksissa. Aineenvaihdunta on kemiallisten reaktioiden järjestelmä, joka ylläpitää elämää solutasolla. Aineenvaihduntaprosessit hajottavat ravinteita energiaksi ja rakennusaineiksi soluille tarpeellisten molekyylien valmistamiseen. Kemiallisten reaktioiden muokkaamia pieniä molekyylejä kutsutaan metaboliiteiksi. Tämä väitöskirja sisältää neljä itsenäistä tutkimusta, jotka jakautuvat teemallisesti biokemiallisten reaktioiden atomitason kuvaamiseen ja metaboliittien massaspektrometriamittausten analysointiin. Väitöskirjan ensimmäisessä osassa käsitellään biokemiallisten reaktioiden atomitason kuvauksia. Väitöskirjassa esitellään optimaalinen algoritmi reaktioiden lähtö- ja tuoteaineiden välisten atomikuvausten määrittämiseen. Optimaalisuus määrittyy verkkojen editointietäisyyteen perustuvalla kustannusfunktiolla. Optimaalinen atomikuvaus mahdollistaa reaktion kuvaamisen yksikäsitteisesti yhdellä verkolla. Uutta reaktiokuvausta hyödynnetään reaktion funktion ennustustehtävässä, jossa pyritään määrittämään reaktiota sanallisesti kuvaava kategoria automaattisesti. Väitöskirjassa esitetään polku-perustainen verkkokerneli, joka kuvaa reaktiot atomien polkusekvensseinä verrattuna aiempiin kulkusekvensseihin saavuttaen paremman ennustustarkkuuden. Väitöskirjan toisessa osassa analysoidaan metaboliittien tandem-massaspektrometriamittauksia. Tandem-massaspektrometri hajottaa analysoitavan syötemolekyylin fragmenteiksi ja mittaa niiden massa-varaus suhteet. Väitöskirjassa esitetään perusteellinen kombinatorinen algoritmi fragmenttien tunnistamiseen. Menetelmän kustannusfunktio perustuu fragmenttien sidosenergioiden vertailuun. Lopuksi väitöskirjassa esitetään fragmentaatiopuut, joiden avulla voidaan mallintaa fragmenttien välisiä suhteita ja saavuttaa parempi tunnistustarkkuus. Fragmenttien tunnistuksen ohella voidaan tunnistaa myös analysoitavia metaboliitteja. Ongelma on merkittävä ja edellytys aineenvaihdunnun analyyseille. Väitöskirjassa esitetään koneoppimismenetelmä, joka ennustaa tuntemattoman metaboliitin rakennetta kuvaavia piirteitä ja muodostaa niiden perusteella rakenne-ennusteita tilastollisesti. Menetelmä esittelee ensimmäiset erityisesti massaspektrometriadataan soveltuvat kernel-funktiot ja saavuttaa hyvän ennustustarkkuuden

    Comparing graphs

    Get PDF
    Graphs are a well-studied mathematical concept, which has become ubiquitous to represent structured data in many application domains like computer vision, social network analysis or chem- and bioinformatics. The ever-increasing amount of data in these domains requires to efficiently organize and extract information from large graph data sets. In this context techniques for comparing graphs are fundamental, e.g., in order to obtain meaningful similarity measures between graphs. These are a prerequisite for the application of a variety of data mining algorithms to the domain of graphs. Hence, various approaches to graph comparison evolved and are wide-spread in practice. This thesis is dedicated to two different strategies for comparing graphs: maximum common subgraph problems and graph kernels. We study maximum common subgraph problems, which are based on classical graph-theoretical concepts for graph comparison and are NP-hard in the general case. We consider variants of the maximum common subgraph problem in restricted graph classes, which are highly relevant for applications in cheminformatics. We develop a polynomial-time algorithm, which allows to compute a maximum common subgraph under block and bridge preserving isomorphism in series-parallel graphs. This generalizes the problem of computing maximum common biconnected subgraphs in series-parallel graphs. We show that previous approaches to this problem, which are based on the separators represented by standard graph decompositions, fail. We introduce the concept of potential separators to overcome this issue and use them algorithmically to solve the problem in series-parallel graphs. We present algorithms with improved bounds on running time for the subclass of outerplanar graphs. Finally, we establish a sufficient condition for maximum common subgraph variants to allow derivation of graph distance metrics. This leads to polynomial-time computable graph distance metrics in restricted graph classes. This progress constitutes a step towards solving practically relevant maximum common subgraph problems in polynomial time. The second contribution of this thesis is to graph kernels, which have their origin in specific data mining algorithms. A key property of graph kernels is that they allow to consider a large (possibly infinite) number of features and can support graphs with arbitrary annotation, while being efficiently computable. The main contributions of this part of the thesis are (i) the development of novel graph kernels, which are especially designed for attributed graphs with arbitrary annotations and (ii) the systematic study of implicit and explicit mapping into a feature space for computation of graph kernels w.r.t. its impact on the running time and the ability to consider arbitrary annotations. We propose graph kernels based on bijections between subgraphs and walks of fixed length. In an experimental study we show that these approaches provide a viable alternative to known techniques, in particular for graphs with complex annotations
    corecore