67 research outputs found

    Minimising Entropy Changes in Dynamic Network Evolution

    Get PDF

    Adaptive feature selection based on the most informative graph-based features

    Get PDF
    In this paper, we propose a novel method to adaptively select the most informative and least redundant feature subset, which has strong discriminating power with respect to the target label. Unlike most traditional methods using vectorial features, our proposed approach is based on graph-based features and thus incorporates the relationships between feature samples into the feature selection process. To efficiently encapsulate the main characteristics of the graph-based features, we probe each graph structure using the steady state random walk and compute a probability distribution of the walk visiting the vertices. Furthermore, we propose a new information theoretic criterion to measure the joint relevance of different pairwise feature combinations with respect to the target feature, through the Jensen-Shannon divergence measure between the probability distributions from the random walk on different graphs. By solving a quadratic programming problem, we use the new measure to automatically locate the subset of the most informative features, that have both low redundancy and strong discriminating power. Unlike most existing state-of-the-art feature selection methods, the proposed information theoretic feature selection method can accommodate both continuous and discrete target features. Experiments on the problem of P2P lending platforms in China demonstrate the effectiveness of the proposed method

    Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation

    Full text link
    Representing patterns as labeled graphs is becoming increasingly common in the broad field of computational intelligence. Accordingly, a wide repertoire of pattern recognition tools, such as classifiers and knowledge discovery procedures, are nowadays available and tested for various datasets of labeled graphs. However, the design of effective learning procedures operating in the space of labeled graphs is still a challenging problem, especially from the computational complexity viewpoint. In this paper, we present a major improvement of a general-purpose classifier for graphs, which is conceived on an interplay between dissimilarity representation, clustering, information-theoretic techniques, and evolutionary optimization algorithms. The improvement focuses on a specific key subroutine devised to compress the input data. We prove different theorems which are fundamental to the setting of the parameters controlling such a compression operation. We demonstrate the effectiveness of the resulting classifier by benchmarking the developed variants on well-known datasets of labeled graphs, considering as distinct performance indicators the classification accuracy, computing time, and parsimony in terms of structural complexity of the synthesized classification models. The results show state-of-the-art standards in terms of test set accuracy and a considerable speed-up for what concerns the computing time.Comment: Revised versio

    Information Theoretic Graph Kernels

    Get PDF
    This thesis addresses the problems that arise in state-of-the-art structural learning methods for (hyper)graph classification or clustering, particularly focusing on developing novel information theoretic kernels for graphs. To this end, we commence in Chapter 3 by defining a family of Jensen-Shannon diffusion kernels, i.e., the information theoretic kernels, for (un)attributed graphs. We show that our kernels overcome the shortcomings of inefficiency (for the unattributed diffusion kernel) and discarding un-isomorphic substructures (for the attributed diffusion kernel) that arise in the R-convolution kernels. In Chapter 4, we present a novel framework of computing depth-based complexity traces rooted at the centroid vertices for graphs, which can be efficiently computed for graphs with large sizes. We show that our methods can characterize a graph in a higher dimensional complexity feature space than state-of-the-art complexity measures. In Chapter 5, we develop a novel unattributed graph kernel by matching the depth-based substructures in graphs, based on the contribution in Chapter 4. Unlike most existing graph kernels in the literature which merely enumerate similar substructure pairs of limited sizes, our method incorporates explicit local substructure correspondence into the process of kernelization. The new kernel thus overcomes the shortcoming of neglecting structural correspondence that arises in most state-of-the-art graph kernels. The novel methods developed in Chapters 3, 4, and 5 are only restricted to graphs. However, real-world data usually tends to be represented by higher order relationships (i.e., hypergraphs). To overcome the shortcoming, in Chapter 6 we present a new hypergraph kernel using substructure isomorphism tests. We show that our kernel limits tottering that arises in the existing walk and subtree based (hyper)graph kernels. In Chapter 7, we summarize the contributions of this thesis. Furthermore, we analyze the proposed methods. Finally, we give some suggestions for the future work

    Structural Data Recognition with Graph Model Boosting

    Get PDF
    This paper presents a novel method for structural data recognition using a large number of graph models. In general, prevalent methods for structural data recognition have two shortcomings: 1) Only a single model is used to capture structural variation. 2) Naive recognition methods are used, such as the nearest neighbor method. In this paper, we propose strengthening the recognition performance of these models as well as their ability to capture structural variation. The proposed method constructs a large number of graph models and trains decision trees using the models. This paper makes two main contributions. The first is a novel graph model that can quickly perform calculations, which allows us to construct several models in a feasible amount of time. The second contribution is a novel approach to structural data recognition: graph model boosting. Comprehensive structural variations can be captured with a large number of graph models constructed in a boosting framework, and a sophisticated classifier can be formed by aggregating the decision trees. Consequently, we can carry out structural data recognition with powerful recognition capability in the face of comprehensive structural variation. The experiments shows that the proposed method achieves impressive results and outperforms existing methods on datasets of IAM graph database repository.Comment: 8 page

    Identifying the most informative features using a structurally interacting elastic net

    Get PDF
    Feature selection can efficiently identify the most informative features with respect to the target feature used in training. However, state-of-the-art vector-based methods are unable to encapsulate the relationships between feature samples into the feature selection process, thus leading to significant information loss. To address this problem, we propose a new graph-based structurally interacting elastic net method for feature selection. Specifically, we commence by constructing feature graphs that can incorporate pairwise relationship between samples. With the feature graphs to hand, we propose a new information theoretic criterion to measure the joint relevance of different pairwise feature combinations with respect to the target feature graph representation. This measure is used to obtain a structural interaction matrix where the elements represent the proposed information theoretic measure between feature pairs. We then formulate a new optimization model through the combination of the structural interaction matrix and an elastic net regression model for the feature subset selection problem. This allows us to (a) preserve the information of the original vectorial space, (b) remedy the information loss of the original feature space caused by using graph representation, and (c) promote a sparse solution and also encourage correlated features to be selected. Because the proposed optimization problem is non-convex, we develop an efficient alternating direction multiplier method (ADMM) to locate the optimal solutions. Extensive experiments on various datasets demonstrate the effectiveness of the proposed method

    Graph Embedding Using Frequency Filtering

    Get PDF
    The target of graph embedding is to embed graphs in vector space such that the embedded feature vectors follow the differences and similarities of the source graphs. In this paper, a novel method named Frequency Filtering Embedding (FFE) is proposed which uses graph Fourier transform and Frequency filtering as a graph Fourier domain operator for graph feature extraction. Frequency filtering amplifies or attenuates selected frequencies using appropriate filter functions. Here, heat, anti-heat, part-sine and identity filter sets are proposed as the filter functions. A generalized version of FFE named GeFFE is also proposed by defining pseudo-Fourier operators. This method can be considered as a general framework for formulating some previously defined invariants in other works by choosing a suitable filter bank and defining suitable pseudo-Fourier operators. This flexibility empowers GeFFE to adapt itself to the properties of each graph dataset unlike the previous spectral embedding methods and leads to superior classification accuracy relative to the others. Utilizing the proposed part-sine filter set which its members filter different parts of the spectrum in turn improves the classification accuracy of GeFFE method. Additionally, GeFFE resolves the cospectrality problem entirely in tested datasets

    A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

    Get PDF
    The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic

    Mining subjectively interesting patterns in rich data

    Get PDF

    Diffusion Wavelet Embedding: a Multi-resolution Approach for Graph Embedding in Vector Space

    Get PDF
    In this article, we propose a multiscale method of embedding a graph into a vector space using diffusion wavelets. At each scale, we extract a detail subspace and a corresponding lower-scale approximation subspace to represent the graph. Representative features are then extracted at each scale to provide a scale-space description of the graph. The lower-scale is constructed using a super-node merging strategy based on nearest neighbor or maximum participation and the new adjacency matrix is generated using vertex identification. This approach allows the comparison of graphs where the important structural differences may be present at varying scales. Additionally, this method can improve the differentiating power of the embedded vectors and this property reduces the possibility of cospectrality typical in spectral methods, substantially. The experimental results show that augmenting the features of abstract levels to the graph features increases the graph classification accuracies in different datasets
    corecore