    A Survey on Graph Kernels

    Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner's guide to kernel-based graph classification

    Propagation Kernels

    We introduce propagation kernels, a general graph-kernel framework for efficiently measuring the similarity of structured data. Propagation kernels are based on monitoring how information spreads through a set of given graphs. They leverage early-stage distributions from propagation schemes such as random walks to capture structural information encoded in node labels, attributes, and edge information. This has two benefits. First, off-the-shelf propagation schemes can be used to naturally construct kernels for many graph types, including labeled, partially labeled, unlabeled, directed, and attributed graphs. Second, by leveraging existing efficient and informative propagation schemes, propagation kernels can be considerably faster than state-of-the-art approaches without sacrificing predictive performance. We will also show that if the graphs at hand have a regular structure, for instance when modeling image or video data, one can exploit this regularity to scale the kernel computation to large databases of graphs with thousands of nodes. We support our contributions by exhaustive experiments on a number of real-world graphs from a variety of application domains

    Learning with Graphs using Kernels from Propagated Information

    Traditional machine learning approaches are designed to learn from independent vector-valued data points. The assumption that instances are independent, however, is not always true. On the contrary, there are numerous domains where data points are cross-linked, for example social networks, where persons are linked by friendship relations. These relations among data points make traditional machine learning diffcult and often insuffcient. Furthermore, data points themselves can have complex structure, for example molecules or proteins constructed from various bindings of different atoms. Networked and structured data are naturally represented by graphs, and for learning we aimto exploit their structure to improve upon non-graph-based methods. However, graphs encountered in real-world applications often come with rich additional information. This naturally implies many challenges for representation and learning: node information is likely to be incomplete leading to partially labeled graphs, information can be aggregated from multiple sources and can therefore be uncertain, or additional information on nodes and edges can be derived from complex sensor measurements, thus being naturally continuous. Although learning with graphs is an active research area, learning with structured data, substantially modeling structural similarities of graphs, mostly assumes fully labeled graphs of reasonable sizes with discrete and certain node and edge information, and learning with networked data, naturally dealing with missing information and huge graphs, mostly assumes homophily and forgets about structural similarity. To close these gaps, we present a novel paradigm for learning with graphs, that exploits the intermediate results of iterative information propagation schemes on graphs. Originally developed for within-network relational and semi-supervised learning, these propagation schemes have two desirable properties: they capture structural information and they can naturally adapt to the aforementioned issues of real-world graph data. Additionally, information propagation can be efficiently realized by random walks leading to fast, flexible, and scalable feature and kernel computations. Further, by considering intermediate random walk distributions, we can model structural similarity for learning with structured and networked data. We develop several approaches based on this paradigm. In particular, we introduce propagation kernels for learning on the graph level and coinciding walk kernels and Markov logic sets for learning on the node level. Finally, we present two application domains where kernels from propagated information successfully tackle real-world problems

    Graph Deep Learning: Methods and Applications

    The past few years have seen the growing prevalence of deep neural networks on various application domains including image processing, computer vision, speech recognition, machine translation, self-driving cars, game playing, social networks, bioinformatics, and healthcare etc. Due to the broad applications and strong performance, deep learning, a subfield of machine learning and artificial intelligence, is changing everyone\u27s life.Graph learning has been another hot field among the machine learning and data mining communities, which learns knowledge from graph-structured data. Examples of graph learning range from social network analysis such as community detection and link prediction, to relational machine learning such as knowledge graph completion and recommender systems, to mutli-graph tasks such as graph classification and graph generation etc.An emerging new field, graph deep learning, aims at applying deep learning to graphs. To deal with graph-structured data, graph neural networks (GNNs) are invented in recent years which directly take graphs as input and output graph/node representations. Although GNNs have shown superior performance than traditional methods in tasks such as semi-supervised node classification, there still exist a wide range of other important graph learning problems where either GNNs\u27 applicabilities have not been explored or GNNs only have less satisfying performance.In this dissertation, we dive deeper into the field of graph deep learning. By developing new algorithms, architectures and theories, we push graph neural networks\u27 boundaries to a much wider range of graph learning problems. The problems we have explored include: 1) graph classification; 2) medical ontology embedding; 3) link prediction; 4) recommender systems; 5) graph generation; and 6) graph structure optimization.We first focus on two graph representation learning problems: graph classification and medical ontology embedding.For graph classification, we develop a novel deep GNN architecture which aggregates node features through a novel SortPooling layer that replaces the simple summing used in previous works. We demonstrate its state-of-the-art graph classification performance on benchmark datasets. For medical ontology embedding, we propose a novel hierarchical attention propagation model, which uses attention mechanism to learn embeddings of medical concepts from hierarchically-structured medical ontologies such as ICD-9 and CCS. We validate the learned embeddings on sequential procedure/diagnosis prediction tasks with real patient data.Then we investigate GNNs\u27 potential for predicting relations, specifically link prediction and recommender systems. For link prediction, we first develop a theory unifying various traditional link prediction heuristics, and then design a framework to automatically learn suitable heuristics from a given network based on GNNs. Our model shows unprecedented strong link prediction performance, significantly outperforming all traditional methods. For recommender systems, we propose a novel graph-based matrix completion model, which uses a GNN to learn graph structure features from the bipartite graph formed by user and item interactions. Our model not only outperforms various matrix completion baselines, but also demonstrates excellent transfer learning ability -- a model trained on MovieLens can be directly used to predict Douban movie ratings with high performance.Finally, we explore GNNs\u27 applicability to graph generation and graph structure optimization. We focus on a specific type of graphs which usually carry computations on them, namely directed acyclic graphs (DAGs). We develop a variational autoencoder (VAE) for DAGs and prove that it can injectively map computations into a latent space. This injectivity allows us to perform optimization in the continuous latent space instead of the original discrete structure space. We then apply our VAE to two types of DAGs, neural network architectures and Bayesian networks. Experiments show that our model not only generates novel and valid DAGs, but also finds high-quality neural architectures and Bayesian networks through performing Bayesian optimization in its latent space

    Discovery of Self-Assembling π\pi-Conjugated Peptides by Active Learning-Directed Coarse-Grained Molecular Simulation

    Electronically-active organic molecules have demonstrated great promise as novel soft materials for energy harvesting and transport. Self-assembled nanoaggregates formed from π\pi-conjugated oligopeptides composed of an aromatic core flanked by oligopeptide wings offer emergent optoelectronic properties within a water soluble and biocompatible substrate. Nanoaggregate properties can be controlled by tuning core chemistry and peptide composition, but the sequence-structure-function relations remain poorly characterized. In this work, we employ coarse-grained molecular dynamics simulations within an active learning protocol employing deep representational learning and Bayesian optimization to efficiently identify molecules capable of assembling pseudo-1D nanoaggregates with good stacking of the electronically-active π\pi-cores. We consider the DXXX-OPV3-XXXD oligopeptide family, where D is an Asp residue and OPV3 is an oligophenylene vinylene oligomer (1,4-distyrylbenzene), to identify the top performing XXX tripeptides within all 203^3 = 8,000 possible sequences. By direct simulation of only 2.3% of this space, we identify molecules predicted to exhibit superior assembly relative to those reported in prior work. Spectral clustering of the top candidates reveals new design rules governing assembly. This work establishes new understanding of DXXX-OPV3-XXXD assembly, identifies promising new candidates for experimental testing, and presents a computational design platform that can be generically extended to other peptide-based and peptide-like systems

    Graph Pattern Mining Techniques to Identify Potential Model Organisms

    Recent advances in high throughput technologies have led to an increasing amount of rich and diverse biological data and related literature. Model organisms are classically selected as subjects for studying human disease based on their genotypic and phenotypic features. A significant problem with model organism identification is the determination of characteristic features related to biological processes that can provide insights into the mechanisms underlying diseases. These insights could have a positive impact on the diagnosis and management of diseases and the development of therapeutic drugs. The increased availability of biological data presents an opportunity to develop data mining methods that can address these challenges and help scientists formulate and test data-driven hypotheses. In this dissertation, data mining methods were developed to provide a quantitative approach for the identification of potential model organisms based on underlying features that may be correlated with disease manifestation in humans. The work encompassed three major types of contributions that aimed to address challenges related to inferring information from biological data available from a range of sources. First, new statistical models and algorithms for graph pattern mining were developed and tested on diverse genres of data (biological networks, drug chemical compounds, and text documents). Second, data mining techniques were developed and shown to identify characteristic disease patterns (disease fingerprints), predict potentially new genetic pathways, and facilitate the assessment of organisms as potential disease models. Third, a methodology was developed that combined the application of graph-based models with information derived from natural language processing methods to identify statistically significant patterns in biomedical text. Together, the approaches developed for this dissertation show promise for summarizing the information about biological processes and phenomena associated with organisms broadly and for the potential assessment of their suitability to study human diseases

    Deep Attention Networks for Images and Graphs

    Deep learning has achieved great success in various machine learning areas, such as computer vision, natural language processing, and graph representation learning. While numerous deep neural networks (DNNs) have been proposed, the set of fundamental building blocks of DNNs remains small, including fully-connected layers, convolutions and recurrent units. Recently, the attention mechanism has shown promise in serving as a new kind of fundamental building blocks. Deep attention networks (DANs), i.e. DNNs that use the attention mechanism as a fundamental building block, have revolutionized the area of natural language processing. However, developing DANs for computer vision and graph representation learning applications is still challenging. Due to the intrinsic differences in data and applications, directly migrating DANs from textual data to images and graphs is usually either infeasible or ineffective. In this dissertation, we address this challenge by analyzing the functionality of the attention mechanism and exploring scenarios where DANs can push the limits of current DNNs. We propose several effective DANs for images and graphs. For images, we build DANs for a variety of image-to-image transformation applications by proposing powerful attention-based building blocks. First, we start the exploration through studying a common problem in dilated convolutions, which naturally results in the use of the attention mechanism. Dilated convolutions, a variant of convolutions, have been widely applied in deep convolutional neural networks (DCNNs) for image segmentation. However, dilated convolutions suffer from the gridding artifacts, which hampers the performance. We propose two simple yet effective degridding methods by studying a decomposition of dilated convolutions, and generalize them by defining separable and shared (SS) operators. Then we connect the SS operators with the attention mechanism and propose the SS output layer, which is able to smooth the entire DCNNs by only replacing the output layer and improves the performance significantly. Second, we notice an interesting fact from the first study that, as the attention mechanism allows the SS output layer to have a receptive field of any size, the best performance is achieved when using a global receptive field. This fact motivates us to think of the attention mechanism as global operators, as opposed to local operators like convolutions. With this insight, we propose the non-local U-Nets, which are equipped with flexible attention-based global aggregation blocks, for biomedical image segmentation. In particular, we are the first to enable the attention mechanism for down-sampling and up-sampling processes. Finally, we go beyond biomedical image segmentation and extend the non-local U-Nets to global voxel transformer networks (GVTNets), which serve as a powerful open-source tool for 3D image-to-image transformation tasks. In addition to leveraging the non-local property of the attention mechanism under the supervised learning setting, we also investigate the generalization ability of the attention mechanism under the transfer learning setting. We perform thorough experiments on a wide range of real-world image-to-image transformation tasks, whose results clearly demonstrate the effectiveness and efficiency of our proposed DANs. For graphs, we develop DANs for both graph and node classification applications. First, we focus on graph pooling, which is necessary for graph neural networks (GNNs) to perform graph classification tasks. In particular, we point out that the second-order pooling naturally satisfies the requirement of graph pooling but encounters practical problems. To overcome these problems, we propose attentional second-order pooling. Specifically, we bridge the second-order pooling with the attention mechanism and design an attention-based pooling method that can be flexibly used as either global or hierarchical graph pooling. Second, on node classification tasks, we pay attention to the problem that most GNNs lack the ability of performing effective non-local aggregation, which greatly limits the performance on disassortative graphs. In particular, it even leads to worse performance of GNNs than simple multi-layer perceptrons on some disassortative graphs. In order to address this problem, we propose a simple yet effective non-local aggregation framework with an efficient attention-guided sorting for GNNs, based on which we develop non-local GNNs. Experimental results on various graph and node classification benchmark datasets show that our DANs improve the performance significantly and consistently