3 research outputs found

    Leveraging graph dimensions in online graph search

    Full text link
    Graphs have been widely used due to its expressive power to model complicated relationships. However, given a graph database DG = {g1; g2; ··· , gn}, it is challenging to process graph queries since a basic graph query usually involves costly graph operations such as maximum common subgraph and graph edit distance computation, which are NP-hard. In this paper, we study a novel DS-preserved mapping which maps graphs in a graph database DG onto a multidimensional space MG under a structural dimension Musing a mapping function φ(). The DS-preserved mapping preserves two things: distance and structure. By the distance-preserving, it means that any two graphs gi and gj in DG must map to two data objects φ(gi) and φ(gj) in MG, such that the distance, d(φ(gi); φ(gj), between φ(gi) and φ(gj) in MG approximates the graph dissimilarity δ(gi; gj) in DG. By the structure-preserving, it further means that for a given unseen query graph q, the distance between q and any graph gi in DG needs to be preserved such that δ(q; gi) ≈ d(φ(q); φ(gi)). We discuss the rationality of using graph dimension M for online graph processing, and show how to identify a small set of subgraphs to form M efficiently. We propose an iterative algorithm DSPM to compute the graph dimension, and discuss its optimization techniques. We also give an approximate algorithm DSPMap in order to handle a large graph database. We conduct extensive performance studies on both real and synthetic datasets to evaluate the top-k similarity query which is to find top-k similar graphs from DG for a query graph, and show the effectiveness and efficiency of our approaches. © 2014 VLDB

    Feature selection for graph kernels

    No full text
    Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process

    Feature Selection for Graph Kernels

    No full text
    Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process
    corecore