308 research outputs found
Robustness, Heterogeneity and Structure Capturing for Graph Representation Learning and its Application
Graph neural networks (GNNs) are potent methods for graph representation learn- ing (GRL), which extract knowledge from complicated (graph) structured data in various real-world scenarios. However, GRL still faces many challenges. Firstly GNN-based node classification may deteriorate substantially by overlooking the pos- sibility of noisy data in graph structures, as models wrongly process the relation among nodes in the input graphs as the ground truth. Secondly, nodes and edges have different types in the real-world and it is essential to capture this heterogeneity in graph representation learning. Next, relations among nodes are not restricted to pairwise relations and it is necessary to capture the complex relations accordingly. Finally, the absence of structural encodings, such as positional information, deterio- rates the performance of GNNs. This thesis proposes novel methods to address the aforementioned problems:
1. Bayesian Graph Attention Network (BGAT): Developed for situations with scarce data, this method addresses the influence of spurious edges. Incor- porating Bayesian principles into the graph attention mechanism enhances robustness, leading to competitive performance against benchmarks (Chapter 3).
2. Neighbour Contrastive Heterogeneous Graph Attention Network (NC-HGAT): By enhancing a cutting-edge self-supervised heterogeneous graph neural net- work model (HGAT) with neighbour contrastive learning, this method ad- dresses heterogeneity and uncertainty simultaneously. Extra attention to edge relations in heterogeneous graphs also aids in subsequent classification tasks (Chapter 4).
3. A novel ensemble learning framework is introduced for predicting stock price movements. It adeptly captures both group-level and pairwise relations, lead- ing to notable advancements over the existing state-of-the-art. The integration of hypergraph and graph models, coupled with the utilisation of auxiliary data via GNNs before recurrent neural network (RNN), provides a deeper under- standing of long-term dependencies between similar entities in multivariate time series analysis (Chapter 5).
4. A novel framework for graph structure learning is introduced, segmenting graphs into distinct patches. By harnessing the capabilities of transformers and integrating other position encoding techniques, this approach robustly capture intricate structural information within a graph. This results in a more comprehensive understanding of its underlying patterns (Chapter 6)
From Hypergraph Energy Functions to Hypergraph Neural Networks
Hypergraphs are a powerful abstraction for representing higher-order
interactions between entities of interest. To exploit these relationships in
making downstream predictions, a variety of hypergraph neural network
architectures have recently been proposed, in large part building upon
precursors from the more traditional graph neural network (GNN) literature.
Somewhat differently, in this paper we begin by presenting an expressive family
of parameterized, hypergraph-regularized energy functions. We then demonstrate
how minimizers of these energies effectively serve as node embeddings that,
when paired with a parameterized classifier, can be trained end-to-end via a
supervised bilevel optimization process. Later, we draw parallels between the
implicit architecture of the predictive models emerging from the proposed
bilevel hypergraph optimization, and existing GNN architectures in common use.
Empirically, we demonstrate state-of-the-art results on various hypergraph node
classification benchmarks. Code is available at
https://github.com/yxzwang/PhenomNN.Comment: Accepted to ICML 202
Chernoff Information in Community Detection
In network inference applications, it is desirable to detect community structure, i.e., cluster vertices into potential blocks. Beyond adjacency matrices, many real-world networks also involve vertex covariates that may carry information about underlying block structure. Since accurate inference on random networks depends on exploiting all available signal, we need scalable algorithms that can incorporate both network connectivity data and additional insight from vertex covariates. In addition, it can be prohibitively expensive to observe the entire graph in many real applications, especially for large graphs. Thus it becomes essential to identify vertices that have the most impact on block structure and only check whether there are edges between them given a limited budget.
To assess the effects of vertex covariates on block recovery, we consider two model-based spectral algorithms. The first algorithm uses only the adjacency matrix, and directly estimates the block assignments. The second algorithm incorporates both the adjacency matrix and the vertex covariates into the estimation of block assignments. We employ Chernoff information to analytically compare the algorithms’ performance and derive the information-theoretic Chernoff ratio for certain models of interest. Analytic results and simulations suggest that the second algorithm is often preferred: one can better estimate the induced block assignments by first estimating the effect of vertex covariates. In addition, real data experiments also indicate that the second algorithm has the advantage of revealing underlying block structure while considering observed vertex heterogeneity in real applications.
Moreover, we propose a dynamic network sampling scheme to optimize block recovery for stochastic blockmodel in the case where it is prohibitively expensive to observe the entire graph. Theoretically, we provide justification of our proposed Chernoff-optimal dynamic sampling scheme via the Chernoff information. Practically, we evaluate the performance of our method on several real datasets from different domains. Both theoretically and practically results suggest that our method can identify vertices that have the most impact on block structure so that one can only check whether there are edges between them to save significant resources but still recover the block structure
Parallel and Flow-Based High Quality Hypergraph Partitioning
Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits.
Given a hypergraph and an integer , the task is to divide the vertices into disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks.
In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge.
The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases.
In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs.
Once sufficiently small, an initial partition is computed.
Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level.
An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time.
The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem.
Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality.
While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible.
We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways.
Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines.
In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof.
We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation.
For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements.
For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly.
Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework.
It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner.
Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level.
This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential.
We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening.
In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio.
This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening.
The last ingredient for high quality is an iterative improvement algorithm based on maximum flows.
In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts.
Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel.
Beyond the strive for highest quality, we present a deterministically parallel partitioning framework.
We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement.
Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small.
All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets.
To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar.
While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain.
With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense.
Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm
Graph Learning and Its Applications: A Holistic Survey
Graph learning is a prevalent domain that endeavors to learn the intricate
relationships among nodes and the topological structure of graphs. These
relationships endow graphs with uniqueness compared to conventional tabular
data, as nodes rely on non-Euclidean space and encompass rich information to
exploit. Over the years, graph learning has transcended from graph theory to
graph data mining. With the advent of representation learning, it has attained
remarkable performance in diverse scenarios, including text, image, chemistry,
and biology. Owing to its extensive application prospects, graph learning
attracts copious attention from the academic community. Despite numerous works
proposed to tackle different problems in graph learning, there is a demand to
survey previous valuable works. While some researchers have perceived this
phenomenon and accomplished impressive surveys on graph learning, they failed
to connect related objectives, methods, and applications in a more coherent
way. As a result, they did not encompass current ample scenarios and
challenging problems due to the rapid expansion of graph learning. Different
from previous surveys on graph learning, we provide a holistic review that
analyzes current works from the perspective of graph structure, and discusses
the latest applications, trends, and challenges in graph learning.
Specifically, we commence by proposing a taxonomy from the perspective of the
composition of graph data and then summarize the methods employed in graph
learning. We then provide a detailed elucidation of mainstream applications.
Finally, based on the current trend of techniques, we propose future
directions.Comment: 20 pages, 7 figures, 3 table
MCMC methods: graph samplers, invariance tests and epidemic models
Markov Chain Monte Carlo (MCMC) techniques are used ubiquitously for simulation-based inference. This thesis provides novel contributions to MCMC methods and their application to graph sampling and epidemic modeling. The first topic considered is that of sampling graphs conditional on a set of prescribed statistics, which is a difficult problem arising naturally in many fields: sociology (Holland and Leinhardt, 1981), psychology (Connor and Simberloff, 1979), categorical data analysis (Agresti, 1992) and finance (Squartini et al., 2018, Gandy and Veraart, 2019) being examples. Bespoke MCMC samplers are proposed for this setting. The second major topic addressed is that of modeling the dynamics of infectious diseases, where MCMC is leveraged as the general inference engine.
The first part of this thesis addresses important problems such as the uniform sampling of graphs with given degree sequences, and weighted graphs with given strength sequences. These distributions are frequently used for exact tests on social networks and two-way contingency tables. Another application is quantifying the statistical significance of patterns observed in real networks. This is crucial for understanding whether such patterns indicate the presence of interesting network phenomena, or whether they simply result from less interesting processes, such as nodal-heterogeneity. The MCMC samplers developed in the course of this research are complex, and there is great scope for conceptual, analytic, and implementation errors. This motivates a chapter that develops novel tests for detecting errors in MCMC implementations. The tests introduced are unique in being exact, which allows us to keep the false rejection probability arbitrarily low.
Rather than develop bespoke samplers, as in the first part of the thesis, the second part leverages a standard MCMC framework Stan (Stan Development Team, 2018) as the workhorse for fitting state-of-the-art epidemic models. We present a general framework for semi-mechanistic Bayesian modeling of infectious diseases using renewal processes. The term semi-mechanistic relates to statistical estimation within some constrained mechanism. This research was motivated by the ongoing SARS-COV-2 pandemic, and variants of the model have been used in specific analyses of Covid-19. We present epidemia, an R package allowing researchers to leverage the epidemic models. A key goal of this work is to demonstrate that MCMC, and in particular, Stan’s No-U-Turn (Hoffman and Gelman, 2014) sampler, can be routinely employed to fit a large-class of epidemic models. A second goal is to make the models accessible to the general research community, through epidemia.Open Acces
Essays in the Econometric Theory of Panel and Multidimensional Data
This dissertation studies econometric models in the presence of unobserved heterogeneity when data is observed over multiple dimensions. Chapter~2 and 3 study this in the classic panel setting with two dimensions, which are usually individuals and time. Chapter~2 studies the setting where unobserved heterogeneity may enter non-linearly and nonseparably to the observed covariates. Established matrix completion methods and a group fixed-effect type estimator prove to approximate the model well. Chapter~3 studies the setting where unobserved heterogeneity enters linearly and separably, but is modelled as a generic functional transformation of unobserved characteristics. The factor model estimated with many factors approximates this form of unobserved heterogeneity well, and, like in Chapter~2, a group fixed-effects estimator also performs well in theory and in simulations. Chapter~4 studies this setting when three or more dimensions are observed in the data and restricts focus to the linear regression model. This chapter extends the notion of the group fixed-effects estimator to a nonparametric kernel style transformation that can be applied to any number of dimensions. The results in this chapter show that the current state-of-the-art factor model methods to approximate unobserved heterogeneity do not extend well to the setting with three or more dimensions. The results also show that the novel nonparametric kernel transformation proposed in this chapter control for unobserved heterogeneity sufficiently well to achieve the parametric rate of consistency under certain conditions
Exact recovery for the non-uniform Hypergraph Stochastic Block Model
Consider the community detection problem in random hypergraphs under the
non-uniform hypergraph stochastic block model (HSBM), where each hyperedge
appears independently with some given probability depending only on the labels
of its vertices. We establish, for the first time in the literature, a sharp
threshold for exact recovery under this non-uniform case, subject to minor
constraints; in particular, we consider the model with classes as well as
the symmetric binary model (). One crucial point here is that by
aggregating information from all the uniform layers, we may obtain exact
recovery even in cases when this may appear impossible if each layer were
considered alone. Two efficient algorithms that successfully achieve exact
recovery above the threshold are provided. The theoretical analysis of our
algorithms relies on the concentration and regularization of the adjacency
matrix for non-uniform random hypergraphs, which could be of independent
interest. We also address some open problems regarding parameter knowledge and
estimation.Comment: 63 pages, 2 tables, 7 figure
Learning common structures in a collection of networks. An application to food webs
Let a collection of networks represent interactions within several (social or
ecological) systems. We pursue two objectives: identifying similarities in the
topological structures that are held in common between the networks and
clustering the collection into sub-collections of structurally homogeneous
networks. We tackle these two questions with a probabilistic model based
approach. We propose an extension of the Stochastic Block Model (SBM) adapted
to the joint modeling of a collection of networks. The networks in the
collection are assumed to be independent realizations of SBMs. The common
connectivity structure is imposed through the equality of some parameters. The
model parameters are estimated with a variational Expectation-Maximization (EM)
algorithm. We derive an ad-hoc penalized likelihood criterion to select the
number of blocks and to assess the adequacy of the consensus found between the
structures of the different networks. This same criterion can also be used to
cluster networks on the basis of their connectivity structure. It thus provides
a partition of the collection into subsets of structurally homogeneous
networks. The relevance of our proposition is assessed on two collections of
ecological networks. First, an application to three stream food webs reveals
the homogeneity of their structures and the correspondence between groups of
species in different ecosystems playing equivalent ecological roles. Moreover,
the joint analysis allows a finer analysis of the structure of smaller
networks. Second, we cluster 67 food webs according to their connectivity
structures and demonstrate that five mesoscale structures are sufficient to
describe this collection
Neural function approximation on graphs: shape modelling, graph discrimination & compression
Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces
- …