11 research outputs found

    Quantifying Social Network Dynamics

    Full text link
    The dynamic character of most social networks requires to model evolution of networks in order to enable complex analysis of theirs dynamics. The following paper focuses on the definition of differences between network snapshots by means of Graph Differential Tuple. These differences enable to calculate the diverse distance measures as well as to investigate the speed of changes. Four separate measures are suggested in the paper with experimental study on real social network data.Comment: In proceedings of the 4th International Conference on Computational Aspects of Social Networks, CASoN 201

    Contains and Inside relationships within combinatorial Pyramids

    Full text link
    Irregular pyramids are made of a stack of successively reduced graphs embedded in the plane. Such pyramids are used within the segmentation framework to encode a hierarchy of partitions. The different graph models used within the irregular pyramid framework encode different types of relationships between regions. This paper compares different graph models used within the irregular pyramid framework according to a set of relationships between regions. We also define a new algorithm based on a pyramid of combinatorial maps which allows to determine if one region contains the other using only local calculus.Comment: 35 page

    Maximum common subgraph isomorphism algorithms for the matching of chemical structures

    Get PDF
    The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks

    On parameterized complexity of the Multi-MCS problem

    Get PDF
    AbstractWe introduce the maximum common subgraph problem for multiple graphs (Multi-MCS) inspired by various biological applications such as multiple alignments of gene sequences, protein structures, metabolic pathways, or protein–protein interaction networks. Multi-MCS is a generalization of the two-graph Maximum Common Subgraph problem (MCS). On the basis of the framework of parameterized complexity theory, we derive the parameterized complexity of Multi-MCS for various parameters for different classes of graphs. For example, for directed graphs with labeled vertices, we prove that the parameterized m-Multi-MCS problem is W[2]-hard, while the parameterized k-Multi-MCS problem is W[t]-hard (∀t≥1), where m and k are the size of the maximum common subgraph and the number of multiple graphs, respectively. We show similar results for other parameterized versions of the Multi-MCS problem for directed graphs with vertex labels and undirected graphs with vertex and edge labels by giving linear FPT reductions of the problems from parameterized versions of the longest common subsequence problem. Likewise, for unlabeled undirected graphs, we show that a parameterized version of the Multi-MCS problem with a fixed number of input graphs is W[1]-complete by showing a linear FPT reduction to and from a parameterized version of the maximum clique problem

    RASCAL: calculation of graph similarity using maximum common edge subgraphs

    Get PDF
    A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection

    Improving the Unification of Software Clones using Tree and Graph Matching Algorithms

    Get PDF
    Code duplication is common in all kind of software systems and is one of the most troublesome hurdles in software maintenance and evolution activities. Even though these code clones are created for the reuse of some functionality, they usually go through several modifications after their initial introduction. This has a serious negative impact on the maintainability, comprehensibility, and evolution of software systems. Existing code duplication can be eliminated by extracting the common functionality into a single module. In the past, several techniques have been developed for the detection and management of software clones. However, the unification and refactoring of software clones is still a challenging problem, since the existing tools are mostly focused on clone detection and there is no tool to find particularly refactoring-oriented clones. The programmers need to manually understand the clones returned by the clone detection tools, decide whether they should be refactored, and finally perform their refactoring. This obvious gap between the clone detection tools and the clone analysis tools, makes the refactoring tedious and the programmers reluctant towards refactoring duplicate codes. In this thesis, an approach for the unification and refactoring of software clones that overcomes the limitations of previous approaches is presented. More specifically, the proposed technique is able to detect and parameterize non-trivial differences between the clones. Moreover, it can find a mapping between the statements of the clones that minimizes the number of differences. We have also defined preconditions in order to determine whether the duplicated code can be safely refactored to preserve the behavior of the existing code. We compared the proposed technique with a competitive clone refactoring tool and concluded that our approach is able to find a significantly larger number of refactorable clones

    Text detection and recognition in images and video sequences

    Get PDF
    Text characters embedded in images and video sequences represents a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values and complex backgrounds. This thesis investigates methods for building an efficient application system for detecting and recognizing text of any grayscale values embedded in images and video sequences. Both empirical image processing methods and statistical machine learning and modeling approaches are studied in two sub-problems: text detection and text recognition. Applying machine learning methods for text detection encounters difficulties due to character size, grayscale variations and heavy computation cost. To overcome these problems, we propose a two-step localization/verification approach. The first step aims at quickly localizing candidate text lines, enabling the normalization of characters into a unique size. In the verification step, a trained support vector machine or multi-layer perceptrons is applied on background independent features to remove the false alarms. Text recognition, even from the detected text lines, remains a challenging problem due to the variety of fonts, colors, the presence of complex backgrounds and the short length of the text strings. Two schemes are investigated addressing the text recognition problem: bi-modal enhancement scheme and multi-modal segmentation scheme. In the bi-modal scheme, we propose a set of filters to enhance the contrast of black and white characters and produce a better binarization before recognition. For more general cases, the text recognition is addressed by a text segmentation step followed by a traditional optical character recognition (OCR) algorithm within a multi-hypotheses framework. In the segmentation step, we model the distribution of grayscale values of pixels using a Gaussian mixture model or a Markov Random Field. The resulting multiple segmentation hypotheses are post-processed by a connected component analysis and a grayscale consistency constraint algorithm. Finally, they are processed by an OCR software. A selection algorithm based on language modeling and OCR statistics chooses the text result from all the produced text strings. Additionally, methods for using temporal information of video text are investigated. A Monte Carlo video text segmentation method is proposed for adapting the segmentation parameters along temporal text frames. Furthermore, a ROVER (Recognizer Output Voting Error Reduction) algorithm is studied for improving the final recognition text string by voting the characters through temporal frames

    Polyhedral study of the maximum common induced subgraph problem

    Get PDF
    Orientador: Cid Carvalho de SouzaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O problema do Máximo Subgrafo Induzido Comum (MSIC) pertence a classe NP-difícil e possui aplicações em diversas áreas. Apesar de sua complexidade, ainda é importante conhecer soluções exatas para instâncias deste problema. Os algoritmos exatos encontrados na literatura buscam resolvê-lo através de técnicas de backtracking ou através de sua redução para o problema da Clique Máxima. Neste trabalho procuramos dar uma solução exata para o MSIC, tratando-o diretamente através da utilização de modelos de Programação Linear Inteira (PLI) e técnicas de combinatória poliédrica. Assim, realizamos um estudo teórico do poliedro do MSIC e fomos capazes de encontrar algumas desigualdades válidas fortes, inclusive com provas de que algumas delas representam facetas daquele poliedro. Adicionalmente, provamos que existe uma equivalâencia entre o modelo PLI aqui apresentado para o MSIC e uma formulação bem conhecida para o problema da Clique Máxima. Posteriormente, foram implementados algoritmos de Branch-and-Bound (B&B) e Branch-and-Cut (B&C) utilizando as desigualdades encontradas e algumas técnicas para tentar tornar os algoritmos mais eficientes. Experimentos foram executados com os algoritmos implementados neste trabalho e, também, com um algoritmo já existente para resolver o problema da Clique, chamado Cliquer. Os resultados foram comparados e, dentre os algoritmos de PLI, constatamos que o mais eficiente foi aquele que utilizou uma formulação para o MSIC que chamamos de Clique-IS, utilizando B&B e técnicas mais básicas que outros algoritmos. Este algoritmo mostrou-se mais eficiente, inclusive, que um algoritmo PLI com um modelo baseado no problema da Clique Máaxima. Este fato sugere que para uma abordagem baseada em PLI, vale a pena utilizar uma formulação do MSIC diretamente, ao invés de uma que se apóie na redução deste para o problema da Clique Máxima. Ja a comparaçao do melhor algoritmo desenvolvido neste trabalho com o Cliquer, mostrou que este último é mais eficiente. Para que um algoritmo baseado em PLI (utilizando uma formulação com as mesmas variáveis usadas por nós) tivesse alguma chance de vencer um algoritmo combinatório como o Cliquer, seria necessário conhecer mais desigualdades que estivessem ativas na solução ótima do problemaAbstract: The Maximum Common Subgraph problem (MSIC) is in MV-hard and has applications in several fields. Despite its complexity, it is still important to know exact solutions for instances of this problem. The exact algorithms found in literature try to solve it through backtracking techniques or through its reduction to the Maximum Clique problem. In this work we try to give an exact solution to MSIC by addressing it directly, using Linear Integer Programming (PLI) and polyhedral combinatorics techniques. So, we performed a study of the MSIC polyhedron and we were able to find some strong valid inequalities, including some that were proven to define facets of that polyhedron. Additionally, we proved that an equivalence between the PLI model presented here for MSIC and a well known formulation for the Maximum Clique problem exists. Later, Branch-and-Bound (B&B) and Branch-and-Cut (B&C) algorithms were implemented using the inequalities found and some techniques to try to render the algorithms more efficient. Experiments were performed with the algorithms implemented in this work and, also, with an already existing algorithm to solve the Maximum Clique problem, called Cliquer. The results were compared and, among the PLI algorithms, we found that the most efficient was the one that used the formulation which we called Clique-IS, using B&B and more basic techniques than other algorithms. This algorithm was even more efficient than a PLI algorithm with a Clique-based model. This fact suggests that for a PLI approach it is worth to use a formulation based on the MSIC polyhedron instead of one based on its reduction to the Maximum Clique problem. The comparison of the best algorithm developed in this work with Cliquer, though, showed that the latest is more efficient. In order to some PLI-based algorithm (using a formulation with the same variables used by us) to have any chance of outperforming a combinatorial algorithm like Cliquer, it would be necessary to know more inequalities that are active in the problem's optimal solutionMestradoOtimização CombinatoriaMestre em Ciência da Computaçã

    Polyhedral study of the maximum common induced subgraph problem

    Get PDF
    O problema do Máximo Subgrafo Induzido Comum (MSIC) pertence a classe NP-difícil e possui aplicações em diversas áreas. Apesar de sua complexidade, ainda é importante conhecer soluções exatas para instâncias deste problema. Os algoritmos exatos encontrados na literatura buscam resolvê-lo através de técnicas de backtracking ou através de sua redução para o problema da Clique Máxima. Neste trabalho procuramos dar uma solução exata para o MSIC, tratando-o diretamente através da utilização de modelos de Programação Linear Inteira (PLI) e técnicas de combinatória poliédrica. Assim, realizamos um estudo teórico do poliedro do MSIC e fomos capazes de encontrar algumas desigualdades válidas fortes, inclusive com provas de que algumas delas representam facetas daquele poliedro. Adicionalmente, provamos que existe uma equivalâencia entre o modelo PLI aqui apresentado para o MSIC e uma formulação bem conhecida para o problema da Clique Máxima. Posteriormente, foram implementados algoritmos de Branch-and-Bound (B&B) e Branch-and-Cut (B&C) utilizando as desigualdades encontradas e algumas técnicas para tentar tornar os algoritmos mais eficientes. Experimentos foram executados com os algoritmos implementados neste trabalho e, também, com um algoritmo já existente para resolver o problema da Clique, chamado Cliquer. Os resultados foram comparados e, dentre os algoritmos de PLI, constatamos que o mais eficiente foi aquele que utilizou uma formulação para o MSIC que chamamos de Clique-IS, utilizando B&B e técnicas mais básicas que outros algoritmos. Este algoritmo mostrou-se mais eficiente, inclusive, que um algoritmo PLI com um modelo baseado no problema da Clique Máaxima. Este fato sugere que para uma abordagem baseada em PLI, vale a pena utilizar uma formulação do MSIC diretamente, ao invés de uma que se apóie na redução deste para o problema da Clique Máxima. Ja a comparaçao do melhor algoritmo desenvolvido neste trabalho com o Cliquer, mostrou que este último é mais eficiente. Para que um algoritmo baseado em PLI (utilizando uma formulação com as mesmas variáveis usadas por nós) tivesse alguma chance de vencer um algoritmo combinatório como o Cliquer, seria necessário conhecer mais desigualdades que estivessem ativas na solução ótima do problema._________________________________________________________________________________________ ABSTRACT: The Maximum Common Subgraph problem (MSIC) is in MV-hard and has applications in several fields. Despite its complexity, it is still important to know exact solutions for instances of this problem. The exact algorithms found in literature try to solve it through backtracking techniques or through its reduction to the Maximum Clique problem. In this work we try to give an exact solution to MSIC by addressing it directly, using Linear Integer Programming (PLI) and polyhedral combinatorics techniques. So, we performed a study of the MSIC polyhedron and we were able to find some strong valid inequalities, including some that were proven to define facets of that polyhedron. Additionally, we proved that an equivalence between the PLI model presented here for MSIC and a well known formulation for the Maximum Clique problem exists. Later, Branch-and-Bound (B&B) and Branch-and-Cut (B&C) algorithms were implemented using the inequalities found and some techniques to try to render the algorithms more efficient. Experiments were performed with the algorithms implemented in this work and, also, with an already existing algorithm to solve the Maximum Clique problem, called Cliquer. The results were compared and, among the PLI algorithms, we found that the most efficient was the one that used the formulation which we called Clique-IS, using B&B and more basic techniques than other algorithms. This algorithm was even more efficient than a PLI algorithm with a Clique-based model. This fact suggests that for a PLI approach it is worth to use a formulation based on the MSIC polyhedron instead of one based on its reduction to the Maximum Clique problem. The comparison of the best algorithm developed in this work with Cliquer, though, showed that the latest is more efficient. In order to some PLI-based algorithm (using a formulation with the same variables used by us) to have any chance of outperforming a combinatorial algorithm like Cliquer, it would be necessary to know more inequalities that are active in the problem's optimal solution

    Graph based pattern discovery in protein structures

    Get PDF
    The rapidly growing body of 3D protein structure data provides new opportunities to study the relation between protein structure and protein function. Local structure pattern of proteins has been the focus of recent efforts to link structural features found in proteins to protein function. In addition, structure patterns have demonstrated values in applications such as predicting protein-protein interaction, engineering proteins, and designing novel medicines. My thesis introduces graph-based representations of protein structure and new subgraph mining algorithms to identify recurring structure patterns common to a set of proteins. These techniques enable families of proteins exhibiting similar function to be analyzed for structural similarity. Previous approaches to protein local structure pattern discovery operate in a pairwise fashion and have prohibitive computational cost when scaled to families of proteins. The graph mining strategy is robust in the face of errors in the structure, and errors in the set of proteins thought to share a function. Two collaborations with domain experts at the UNC School of Pharmacy and the UNC Medical School demonstrate the utility of these techniques. The first is to predict the function of several newly characterized protein structures. The second is to identify conserved structural features in evolutionarily related proteins
    corecore