822 research outputs found
Graph Clustering with Graph Neural Networks
Graph Neural Networks (GNNs) have achieved state-of-the-art results on many
graph analysis tasks such as node classification and link prediction. However,
important unsupervised problems on graphs, such as graph clustering, have
proved more resistant to advances in GNNs. In this paper, we study unsupervised
training of GNN pooling in terms of their clustering capabilities.
We start by drawing a connection between graph clustering and graph pooling:
intuitively, a good graph clustering is what one would expect from a GNN
pooling layer. Counterintuitively, we show that this is not true for
state-of-the-art pooling methods, such as MinCut pooling. To address these
deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised
pooling method inspired by the modularity measure of clustering quality, and
show how it tackles recovery of the challenging clustering structure of
real-world graphs. In order to clarify the regimes where existing methods fail,
we carefully design a set of experiments on synthetic data which show that DMoN
is able to jointly leverage the signal from the graph structure and node
attributes. Similarly, on real-world data, we show that DMoN produces high
quality clusters which correlate strongly with ground truth labels, achieving
state-of-the-art results
COMMUNITY DETECTION IN GRAPHS
Thesis (Ph.D.) - Indiana University, Luddy School of Informatics, Computing, and Engineering/University Graduate School, 2020Community detection has always been one of the fundamental research topics in graph mining. As a type of unsupervised or semi-supervised approach, community detection aims to explore node high-order closeness by leveraging graph topological structure. By grouping similar nodes or edges into the same community while separating dissimilar ones apart into different communities, graph structure can be revealed in a coarser resolution. It can be beneficial for numerous applications such as user shopping recommendation and advertisement in e-commerce, protein-protein interaction prediction in the bioinformatics, and literature recommendation or scholar collaboration in citation
analysis. However, identifying communities is an ill-defined problem. Due to the No Free Lunch theorem [1], there is neither gold standard to represent perfect community partition nor universal methods that are able to detect satisfied communities for all tasks under various types of graphs. To have a global view of this research topic, I summarize state-of-art community detection methods by categorizing them based on graph types, research tasks and methodology frameworks. As academic exploration on community detection grows rapidly in recent years, I hereby particularly focus on the state-of-art works published in the latest decade, which may leave out some classic models published decades ago. Meanwhile, three subtle community detection tasks are proposed and assessed in this dissertation as well. First, apart from general models which consider only graph structures, personalized community detection considers user need as auxiliary information to guide community detection. In the end, there will be fine-grained communities for nodes better matching user needs while coarser-resolution communities for the rest of less relevant nodes. Second, graphs always suffer from the sparse connectivity issue. Leveraging conventional models directly on such graphs may hugely distort the quality of generate communities. To tackle such a problem, cross-graph techniques are involved to propagate external graph information as a support for target graph community detection. Third, graph community structure supports a natural language processing (NLP) task to depict node intrinsic characteristics by generating node summarizations via a text generative model. The contribution of this dissertation is threefold. First, a decent amount of researches are reviewed and summarized under a well-defined taxonomy. Existing works about methods, evaluation and applications are all addressed in the literature review. Second, three novel community detection tasks are demonstrated and associated models are proposed and evaluated by comparing with state-of-art baselines under various datasets. Third, the limitations of current works are pointed out and future research tracks with potentials are discussed as well
Median evidential c-means algorithm and its application to community detection
Median clustering is of great value for partitioning relational data. In this
paper, a new prototype-based clustering method, called Median Evidential
C-Means (MECM), which is an extension of median c-means and median fuzzy
c-means on the theoretical framework of belief functions is proposed. The
median variant relaxes the restriction of a metric space embedding for the
objects but constrains the prototypes to be in the original data set. Due to
these properties, MECM could be applied to graph clustering problems. A
community detection scheme for social networks based on MECM is investigated
and the obtained credal partitions of graphs, which are more refined than crisp
and fuzzy ones, enable us to have a better understanding of the graph
structures. An initial prototype-selection scheme based on evidential
semi-centrality is presented to avoid local premature convergence and an
evidential modularity function is defined to choose the optimal number of
communities. Finally, experiments in synthetic and real data sets illustrate
the performance of MECM and show its difference to other methods
The EDAM Project: Mining Atmospheric Aerosol Datasets
Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, is not adequate for studying particle dynamics and real-time correlations. This has led to the development of a new generation of real-time instruments that provide continuous or semi-continuous streams of data about certain aerosol properties. However, these instruments have added a significant level of complexity to atmospheric aerosol data, and dramatically increased the amounts of data to be collected, managed, and analyzed. Our abilit y to integrate the data from all of these new and complex instruments now lags far behind our data-collection capabilities, and severely limits our ability to understand the data and act upon it in a timely manner. In this paper, we present an overview of the EDAM project. The goal of the project, which is in its early stages, is to develop novel data mining algorithms and approaches to managing and monitoring multiple complex data streams. An important objective is data quality assurance, and real-time data mining offers great potential. The approach that we take should also provide good techniques to deal with gas-phase and semi-volatile data. While atmospheric aerosol analysis is an important and challenging domain that motivates us with real problems and serves as a concrete test of our results, our objective is to develop techniques that have broader applicability, and to explore some fundamental challenges in data mining that are not specific to any given application domain
Graph Learning and Its Applications: A Holistic Survey
Graph learning is a prevalent domain that endeavors to learn the intricate
relationships among nodes and the topological structure of graphs. These
relationships endow graphs with uniqueness compared to conventional tabular
data, as nodes rely on non-Euclidean space and encompass rich information to
exploit. Over the years, graph learning has transcended from graph theory to
graph data mining. With the advent of representation learning, it has attained
remarkable performance in diverse scenarios, including text, image, chemistry,
and biology. Owing to its extensive application prospects, graph learning
attracts copious attention from the academic community. Despite numerous works
proposed to tackle different problems in graph learning, there is a demand to
survey previous valuable works. While some researchers have perceived this
phenomenon and accomplished impressive surveys on graph learning, they failed
to connect related objectives, methods, and applications in a more coherent
way. As a result, they did not encompass current ample scenarios and
challenging problems due to the rapid expansion of graph learning. Different
from previous surveys on graph learning, we provide a holistic review that
analyzes current works from the perspective of graph structure, and discusses
the latest applications, trends, and challenges in graph learning.
Specifically, we commence by proposing a taxonomy from the perspective of the
composition of graph data and then summarize the methods employed in graph
learning. We then provide a detailed elucidation of mainstream applications.
Finally, based on the current trend of techniques, we propose future
directions.Comment: 20 pages, 7 figures, 3 table
- …