3,452 research outputs found

    Prediction of Atomization Energy Using Graph Kernel and Active Learning

    Get PDF
    Data-driven prediction of molecular properties presents unique challenges to the design of machine learning methods concerning data structure/dimensionality, symmetry adaption, and confidence management. In this paper, we present a kernel-based pipeline that can learn and predict the atomization energy of molecules with high accuracy. The framework employs Gaussian process regression to perform predictions based on the similarity between molecules, which is computed using the marginalized graph kernel. To apply the marginalized graph kernel, a spatial adjacency rule is first employed to convert molecules into graphs whose vertices and edges are labeled by elements and interatomic distances, respectively. We then derive formulas for the efficient evaluation of the kernel. Specific functional components for the marginalized graph kernel are proposed, while the effect of the associated hyperparameters on accuracy and predictive confidence are examined. We show that the graph kernel is particularly suitable for predicting extensive properties because its convolutional structure coincides with that of the covariance formula between sums of random variables. Using an active learning procedure, we demonstrate that the proposed method can achieve a mean absolute error of 0.62 +- 0.01 kcal/mol using as few as 2000 training samples on the QM7 data set

    A Survey on Graph Kernels

    Get PDF
    Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner's guide to kernel-based graph classification

    Complexity vs. performance in granular embedding spaces for graph classification

    Get PDF
    The most distinctive trait in structural pattern recognition in graph domain is the ability to deal with the organization and relations between the constituent entities of the pattern. Even if this can be convenient and/or necessary in many contexts, most of the state-of the art classi\ufb01cation techniques can not be deployed directly in the graph domain without \ufb01rst embedding graph patterns towards a metric space. Granular Computing is a powerful information processing paradigm that can be employed in order to drive the synthesis of automatic embedding spaces from structured domains. In this paper we investigate several classi\ufb01cation techniques starting from Granular Computing-based embedding procedures and provide a thorough overview in terms of model complexity, embedding space complexity and performances on several open-access datasets for graph classi\ufb01cation. We witness that certain classi\ufb01cation techniques perform poorly both from the point of view of complexity and learning performances as the case of non-linear SVM, suggesting that high dimensionality of the synthesized embedding space can negatively affect the effectiveness of these approaches. On the other hand, linear support vector machines, neuro-fuzzy networks and nearest neighbour classi\ufb01ers have comparable performances in terms of accuracy, with second being the most competitive in terms of structural complexity and the latter being the most competitive in terms of embedding space dimensionality

    Topology Change Localisation in WSNs

    Get PDF

    Massively parallelizing the RRT and the RRT*

    Get PDF
    In recent years, the growth of the computational power available in the Central Processing Units (CPUs) of consumer computers has tapered significantly. At the same time, growth in the computational power available in the Graphics Processing Units (GPUs) has remained strong. Algorithms that can be implemented on GPUs today are not only limited to graphics processing, but include scientific computation and beyond. This paper is concerned with massively parallel implementations of incremental sampling-based robot motion planning algorithms, namely the widely-used Rapidly-exploring Random Tree (RRT) algorithm and its asymptotically-optimal counterpart called RRT*. We demonstrate an example implementation of RRT and RRT* motion-planning algorithm for a high-dimensional robotic manipulator that takes advantage of an NVidia CUDA-enabled GPU. We focus on parallelizing the collision-checking procedure, which is generally recognized as the computationally expensive component of sampling-based motion planning algorithms. Our experimental results indicate significant speedup when compared to CPU implementations, leading to practical algorithms for optimal motion planning in high-dimensional configuration spaces

    Relaxed Dissimilarity-based Symbolic Histogram Variants for Granular Graph Embedding

    Get PDF
    Graph embedding is an established and popular approach when designing graph-based pattern recognition systems. Amongst the several strategies, in the last ten years, Granular Computing emerged as a promising framework for structural pattern recognition. In the late 2000\u2019s, symbolic histograms have been proposed as the driving force in order to perform the graph embedding procedure by counting the number of times each granule of information appears in the graph to be embedded. Similarly to a bag-of-words representation of a text corpora, symbolic histograms have been originally conceived as integer-valued vectorial representation of the graphs. In this paper, we propose six \u2018relaxed\u2019 versions of symbolic histograms, where the proper dissimilarity values between the information granules and the constituent parts of the graph to be embedded are taken into account, information which is discarded in the original symbolic histogram formulation due to the hard-limited nature of the counting procedure. Experimental results on six open-access datasets of fully-labelled graphs show comparable performance in terms of classification accuracy with respect to the original symbolic histograms (average accuracy shift ranging from -7% to +2%), counterbalanced by a great improvement in terms of number of resulting information granules, hence number of features in the embedding space (up to 75% less features, on average)
    • …
    corecore