248 research outputs found

    A survey of frequent subgraph mining algorithms

    Get PDF

    GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases

    Get PDF
    BACKGROUND: Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions.RESULTS: An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time.CONCLUSION: The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures

    Hierarchical stochastic graphlet embedding for graph-based pattern recognition

    Get PDF
    This is the final version. Available on open access from Springer via the DOI in this recordDespite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable with many machine learning tools. This is because of the incompatibility of most of the mathematical operations in graph domain. Graph embedding has been proposed as a way to tackle these difficulties, which maps graphs to a vector space and makes the standard machine learning techniques applicable for them. However, it is well known that graph embedding techniques usually suffer from the loss of structural information. In this paper, given a graph, we consider its hierarchical structure for mapping it into a vector space. The hierarchical structure is constructed by topologically clustering the graph nodes, and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure of graph is constructed, we consider its various configurations of its parts, and use stochastic graphlet embedding (SGE) for mapping them into vector space. Broadly speaking, SGE produces a distribution of uniformly sampled low to high order graphlets as a way to embed graphs into the vector space. In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched through the distribution of low to high order stochastic graphlets complements each other and include important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, and it is not a surprise that we obtain more robust vector space embedding of graphs. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods.European Union Horizon 2020Ministerio de Educación, Cultura y Deporte, SpainGeneralitat de Cataluny

    Algebraic graph theory in the analysis of frequency assignment problems

    Get PDF
    Frequency Assignment Problems (FAPs) arise when transmitters need to be allocated frequencies with the aim of minimizing interference, whilst maintaining an efficient use of the radio spectrum. In this thesis FAPs are seen as generalised graph colouring problems, where transmitters are represented by vertices, and their interactions by weighted edges. Solving FAPs often relies on known structural properties to facilitate algorithms. When no structural information is available explicitly, obtaining it from numerical data is difficult. This lack of structural information is a key underlying motivation for the research work in this thesis. If there are TV transmitters to be assigned, we assume as given an N x N "influence matrix" W with entries Wij representing influence between transmitters i and j. From this matrix we derive the Laplacian matrix L = D—W, where D is a diagonal matrix whose entries da are the sum of all influences working in transmitter i. The focus of this thesis is the study of mathematical properties of the matrix L. We généralisé certain properties of the Laplacian eigenvalues and eigenvectors that hold for simple graphs. We also observe and discuss changes in the shape of the Laplacian eigenvalue spectrum due to modifications of a FAP. We include a number of computational experiments and generated simulated examples of FAPs for which we explicitly calculate eigenvalues and eigenvectors in order to test the developed theoretical results. We find that the Laplacians prove useful in identifying certain types of problems, providing structured approach to reducing the original FAP to smaller size subproblems, hence assisting existing heuristic algorithms for solving frequency assignments. In that sense we conclude that analysis of the Laplacians is a useful tool for better understanding of FAPs

    Frequent subgraph mining algorithms on weighted graphs

    Get PDF
    This thesis describes research work undertaken in the field of graph-based knowledge discovery (or graph mining). The objective of the research is to investigate the benefits that the concept of weighted frequent subgraph mining can offer in the context of the graph model based classification. Weighted subgraphs are graphs where some of the vertexes/edges are considered to be more significant than others. How to discover frequent sub-structures with different strengths is the main issue to be resolved in this thesis. The main approach to addressing this issue is to integrate weight constraints into the frequent subgraph mining process. It is suggested that the utilization of weighted frequent subgraph mining generates more discriminate and significant subgraphs, which will have application in, for example, the classification and clustering of graph data
    • …
    corecore