34 research outputs found

    Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation

    Full text link
    Representing patterns as labeled graphs is becoming increasingly common in the broad field of computational intelligence. Accordingly, a wide repertoire of pattern recognition tools, such as classifiers and knowledge discovery procedures, are nowadays available and tested for various datasets of labeled graphs. However, the design of effective learning procedures operating in the space of labeled graphs is still a challenging problem, especially from the computational complexity viewpoint. In this paper, we present a major improvement of a general-purpose classifier for graphs, which is conceived on an interplay between dissimilarity representation, clustering, information-theoretic techniques, and evolutionary optimization algorithms. The improvement focuses on a specific key subroutine devised to compress the input data. We prove different theorems which are fundamental to the setting of the parameters controlling such a compression operation. We demonstrate the effectiveness of the resulting classifier by benchmarking the developed variants on well-known datasets of labeled graphs, considering as distinct performance indicators the classification accuracy, computing time, and parsimony in terms of structural complexity of the synthesized classification models. The results show state-of-the-art standards in terms of test set accuracy and a considerable speed-up for what concerns the computing time.Comment: Revised versio

    Structural Data Recognition with Graph Model Boosting

    Get PDF
    This paper presents a novel method for structural data recognition using a large number of graph models. In general, prevalent methods for structural data recognition have two shortcomings: 1) Only a single model is used to capture structural variation. 2) Naive recognition methods are used, such as the nearest neighbor method. In this paper, we propose strengthening the recognition performance of these models as well as their ability to capture structural variation. The proposed method constructs a large number of graph models and trains decision trees using the models. This paper makes two main contributions. The first is a novel graph model that can quickly perform calculations, which allows us to construct several models in a feasible amount of time. The second contribution is a novel approach to structural data recognition: graph model boosting. Comprehensive structural variations can be captured with a large number of graph models constructed in a boosting framework, and a sophisticated classifier can be formed by aggregating the decision trees. Consequently, we can carry out structural data recognition with powerful recognition capability in the face of comprehensive structural variation. The experiments shows that the proposed method achieves impressive results and outperforms existing methods on datasets of IAM graph database repository.Comment: 8 page

    Information Theoretic Graph Kernels

    Get PDF
    This thesis addresses the problems that arise in state-of-the-art structural learning methods for (hyper)graph classification or clustering, particularly focusing on developing novel information theoretic kernels for graphs. To this end, we commence in Chapter 3 by defining a family of Jensen-Shannon diffusion kernels, i.e., the information theoretic kernels, for (un)attributed graphs. We show that our kernels overcome the shortcomings of inefficiency (for the unattributed diffusion kernel) and discarding un-isomorphic substructures (for the attributed diffusion kernel) that arise in the R-convolution kernels. In Chapter 4, we present a novel framework of computing depth-based complexity traces rooted at the centroid vertices for graphs, which can be efficiently computed for graphs with large sizes. We show that our methods can characterize a graph in a higher dimensional complexity feature space than state-of-the-art complexity measures. In Chapter 5, we develop a novel unattributed graph kernel by matching the depth-based substructures in graphs, based on the contribution in Chapter 4. Unlike most existing graph kernels in the literature which merely enumerate similar substructure pairs of limited sizes, our method incorporates explicit local substructure correspondence into the process of kernelization. The new kernel thus overcomes the shortcoming of neglecting structural correspondence that arises in most state-of-the-art graph kernels. The novel methods developed in Chapters 3, 4, and 5 are only restricted to graphs. However, real-world data usually tends to be represented by higher order relationships (i.e., hypergraphs). To overcome the shortcoming, in Chapter 6 we present a new hypergraph kernel using substructure isomorphism tests. We show that our kernel limits tottering that arises in the existing walk and subtree based (hyper)graph kernels. In Chapter 7, we summarize the contributions of this thesis. Furthermore, we analyze the proposed methods. Finally, we give some suggestions for the future work

    Promise Theory and the Alignment of Context, Processes, Types, and Transforms

    Get PDF
    Promise Theory concerns the 'alignment', i.e. the degree of functional compatibility and the 'scaling' properties of process outcomes in agent-based models, with causality and intentional semantics. It serves as an umbrella for other theories of interaction, from physics to socio-economics, integrating dynamical and semantic concerns into a single framework. It derives its measures from sets, and can therefore incorporate a wide range of descriptive techniques, giving additional structure with predictive constraints. We review some structural details of Promise Theory, applied to Promises of the First Kind, to assist in the comparison of Promise Theory with other forms of physical and mathematical modelling, including Category Theory and Dynamical Systems. We explain how Promise Theory is distinct from other kinds of model, but has a natural structural similarity to statistical mechanics and quantum theory, albeit with different goals; it respects and clarifies the bounds of locality, while incorporating non-local communication. We derive the relationship between promises and morphisms to the extent that this would be a useful comparison

    Graph Embedding Using Frequency Filtering

    Get PDF
    The target of graph embedding is to embed graphs in vector space such that the embedded feature vectors follow the differences and similarities of the source graphs. In this paper, a novel method named Frequency Filtering Embedding (FFE) is proposed which uses graph Fourier transform and Frequency filtering as a graph Fourier domain operator for graph feature extraction. Frequency filtering amplifies or attenuates selected frequencies using appropriate filter functions. Here, heat, anti-heat, part-sine and identity filter sets are proposed as the filter functions. A generalized version of FFE named GeFFE is also proposed by defining pseudo-Fourier operators. This method can be considered as a general framework for formulating some previously defined invariants in other works by choosing a suitable filter bank and defining suitable pseudo-Fourier operators. This flexibility empowers GeFFE to adapt itself to the properties of each graph dataset unlike the previous spectral embedding methods and leads to superior classification accuracy relative to the others. Utilizing the proposed part-sine filter set which its members filter different parts of the spectrum in turn improves the classification accuracy of GeFFE method. Additionally, GeFFE resolves the cospectrality problem entirely in tested datasets

    Mining subjectively interesting patterns in rich data

    Get PDF

    Diffusion Wavelet Embedding: a Multi-resolution Approach for Graph Embedding in Vector Space

    Get PDF
    In this article, we propose a multiscale method of embedding a graph into a vector space using diffusion wavelets. At each scale, we extract a detail subspace and a corresponding lower-scale approximation subspace to represent the graph. Representative features are then extracted at each scale to provide a scale-space description of the graph. The lower-scale is constructed using a super-node merging strategy based on nearest neighbor or maximum participation and the new adjacency matrix is generated using vertex identification. This approach allows the comparison of graphs where the important structural differences may be present at varying scales. Additionally, this method can improve the differentiating power of the embedded vectors and this property reduces the possibility of cospectrality typical in spectral methods, substantially. The experimental results show that augmenting the features of abstract levels to the graph features increases the graph classification accuracies in different datasets

    A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

    Get PDF
    The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic

    Learning Shape-Classes Using a Mixture of Tree-Unions

    Get PDF
    This paper poses the problem of tree-clustering as that of fitting a mixture of tree unions to a set of sample trees. The tree-unions are structures from which the individual data samples belonging to a cluster can be obtained by edit operations. The distribution of observed tree nodes in each cluster sample is assumed to be governed by a Bernoulli distribution. The clustering method is designed to operate when the correspondences between nodes are unknown and must be inferred as part of the learning process. We adopt a minimum description length approach to the problem of fitting the mixture model to data. We make maximum-likelihood estimates of the Bernoulli parameters. The tree-unions and the mixing proportions are sought so as to minimize the description length criterion. This is the sum of the negative logarithm of the Bernoulli distribution, and a message-length criterion that encodes both the complexity of the union-trees and the number of mixture components. We locate node correspondences by minimizing the edit distance with the current tree-unions, and show that the edit distance is linked to the description length criterion. The method can be applied to both unweighted and weighted trees. We illustrate the utility of the resulting algorithm on the problem of classifying 2D shapes using a shock graph representation

    Random Graph Modeling and Discovery

    Get PDF
    In the first part of this thesis, we present a general class of models for random graphs that is applicable to a broad range of problems, including those in which graphs have complicated edge structures. These models need not be conditioned on a fixed number of vertices, as is often the case in the literature, and can be used for problems in which graphs have attributes associated with their vertices and edges. To allow structure in these models, a framework analogous to graphical models is developed for random graphs. In the second part of this thesis, we consider the situation in which there is an unknown graph that one wants to determine. This is a common occurrence since, in general, entities in the world are not directly observable, but must be inferred from some signal. We consider a general framework for uncovering these unknown graphs by a sequence of ‘tests’ or ‘questions’. We refer to this framework as graph discovery. In the third part of this thesis, we apply graph discovery to a problem in computer vision. To evaluate how well vision systems perform, their interpretations of imagery must be compared to the true ones. Often, image interpretations can be expressed as graphs; for example, vertices can represent objects and edges can represent relationships between objects. Thus, an image, before it is interpreted, corresponds to an unknown graph, and the interpretation of an image corresponds to graph discovery. In this work, we are interested in the evaluation of vision systems when these representation graphs are complex. We propose a visual Turing test for this purpose
    corecore