493 research outputs found
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection
Massive key performance indicators (KPIs) are monitored as multivariate time
series data (MTS) to ensure the reliability of the software applications and
service system. Accurately detecting the abnormality of MTS is very critical
for subsequent fault elimination. The scarcity of anomalies and manual labeling
has led to the development of various self-supervised MTS anomaly detection
(AD) methods, which optimize an overall objective/loss encompassing all
metrics' regression objectives/losses. However, our empirical study uncovers
the prevalence of conflicts among metrics' regression objectives, causing MTS
models to grapple with different losses. This critical aspect significantly
impacts detection performance but has been overlooked in existing approaches.
To address this problem, by mimicking the design of multi-gate
mixture-of-experts (MMoE), we introduce CAD, a Conflict-aware multivariate KPI
Anomaly Detection algorithm. CAD offers an exclusive structure for each metric
to mitigate potential conflicts while fostering inter-metric promotions. Upon
thorough investigation, we find that the poor performance of vanilla MMoE
mainly comes from the input-output misalignment settings of MTS formulation and
convergence issues arising from expansive tasks. To address these challenges,
we propose a straightforward yet effective task-oriented metric selection and
p&s (personalized and shared) gating mechanism, which establishes CAD as the
first practicable multi-task learning (MTL) based MTS AD model. Evaluations on
multiple public datasets reveal that CAD obtains an average F1-score of 0.943
across three public datasets, notably outperforming state-of-the-art methods.
Our code is accessible at https://github.com/dawnvince/MTS_CAD.Comment: 11 pages, ESEC/FSE industry track 202
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
Graph Neural Network for spatiotemporal data: methods and applications
In the era of big data, there has been a surge in the availability of data
containing rich spatial and temporal information, offering valuable insights
into dynamic systems and processes for applications such as weather
forecasting, natural disaster management, intelligent transport systems, and
precision agriculture. Graph neural networks (GNNs) have emerged as a powerful
tool for modeling and understanding data with dependencies to each other such
as spatial and temporal dependencies. There is a large amount of existing work
that focuses on addressing the complex spatial and temporal dependencies in
spatiotemporal data using GNNs. However, the strong interdisciplinary nature of
spatiotemporal data has created numerous GNNs variants specifically designed
for distinct application domains. Although the techniques are generally
applicable across various domains, cross-referencing these methods remains
essential yet challenging due to the absence of a comprehensive literature
review on GNNs for spatiotemporal data. This article aims to provide a
systematic and comprehensive overview of the technologies and applications of
GNNs in the spatiotemporal domain. First, the ways of constructing graphs from
spatiotemporal data are summarized to help domain experts understand how to
generate graphs from various types of spatiotemporal data. Then, a systematic
categorization and summary of existing spatiotemporal GNNs are presented to
enable domain experts to identify suitable techniques and to support model
developers in advancing their research. Moreover, a comprehensive overview of
significant applications in the spatiotemporal domain is offered to introduce a
broader range of applications to model developers and domain experts, assisting
them in exploring potential research topics and enhancing the impact of their
work. Finally, open challenges and future directions are discussed
Unsupervised Structural Embedding Methods for Efficient Collective Network Mining
How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient.
In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd
Geometry-aware Transformer for molecular property prediction
Recently, graph neural networks (GNNs) have achieved remarkable performances
for quantum mechanical problems. However, a graph convolution can only cover a
localized region, and cannot capture long-range interactions of atoms. This
behavior is contrary to theoretical interatomic potentials, which is a
fundamental limitation of the spatial based GNNs. In this work, we propose a
novel attention-based framework for molecular property prediction tasks. We
represent a molecular conformation as a discrete atomic sequence combined by
atom-atom distance attributes, named Geometry-aware Transformer (GeoT). In
particular, we adopt a Transformer architecture, which has been widely used for
sequential data. Our proposed model trains sequential representations of
molecular graphs based on globally constructed attentions, maintaining all
spatial arrangements of atom pairs. Our method does not suffer from cost
intensive computations, such as angle calculations. The experimental results on
several public benchmarks and visualization maps verified that keeping the
long-range interatomic attributes can significantly improve the model
predictability.Comment: 14 pages, 5 figure
A Comprehensive Survey on Deep Graph Representation Learning
Graph representation learning aims to effectively encode high-dimensional
sparse graph-structured data into low-dimensional dense vectors, which is a
fundamental task that has been widely studied in a range of fields, including
machine learning and data mining. Classic graph embedding methods follow the
basic idea that the embedding vectors of interconnected nodes in the graph can
still maintain a relatively close distance, thereby preserving the structural
information between the nodes in the graph. However, this is sub-optimal due
to: (i) traditional methods have limited model capacity which limits the
learning performance; (ii) existing techniques typically rely on unsupervised
learning strategies and fail to couple with the latest learning paradigms;
(iii) representation learning and downstream tasks are dependent on each other
which should be jointly enhanced. With the remarkable success of deep learning,
deep graph representation learning has shown great potential and advantages
over shallow (traditional) methods, there exist a large number of deep graph
representation learning techniques have been proposed in the past decade,
especially graph neural networks. In this survey, we conduct a comprehensive
survey on current deep graph representation learning algorithms by proposing a
new taxonomy of existing state-of-the-art literature. Specifically, we
systematically summarize the essential components of graph representation
learning and categorize existing approaches by the ways of graph neural network
architectures and the most recent advanced learning paradigms. Moreover, this
survey also provides the practical and promising applications of deep graph
representation learning. Last but not least, we state new perspectives and
suggest challenging directions which deserve further investigations in the
future
- …