33,166 research outputs found
Deep Representation Learning for Social Network Analysis
Social network analysis is an important problem in data mining. A fundamental
step for analyzing social networks is to encode network data into
low-dimensional representations, i.e., network embeddings, so that the network
topology structure and other attribute information can be effectively
preserved. Network representation leaning facilitates further applications such
as classification, link prediction, anomaly detection and clustering. In
addition, techniques based on deep neural networks have attracted great
interests over the past a few years. In this survey, we conduct a comprehensive
review of current literature in network representation learning utilizing
neural network models. First, we introduce the basic models for learning node
representations in homogeneous networks. Meanwhile, we will also introduce some
extensions of the base models in tackling more complex scenarios, such as
analyzing attributed networks, heterogeneous networks and dynamic networks.
Then, we introduce the techniques for embedding subgraphs. After that, we
present the applications of network representation learning. At the end, we
discuss some promising research directions for future work
Towards combinatorial clustering: preliminary research survey
The paper describes clustering problems from the combinatorial viewpoint. A
brief systemic survey is presented including the following: (i) basic
clustering problems (e.g., classification, clustering, sorting, clustering with
an order over cluster), (ii) basic approaches to assessment of objects and
object proximities (i.e., scales, comparison, aggregation issues), (iii) basic
approaches to evaluation of local quality characteristics for clusters and
total quality characteristics for clustering solutions, (iv) clustering as
multicriteria optimization problem, (v) generalized modular clustering
framework, (vi) basic clustering models/methods (e.g., hierarchical clustering,
k-means clustering, minimum spanning tree based clustering, clustering as
assignment, detection of clisue/quasi-clique based clustering, correlation
clustering, network communities based clustering), Special attention is
targeted to formulation of clustering as multicriteria optimization models.
Combinatorial optimization models are used as auxiliary problems (e.g.,
assignment, partitioning, knapsack problem, multiple choice problem,
morphological clique problem, searching for consensus/median for structures).
Numerical examples illustrate problem formulations, solving methods, and
applications. The material can be used as follows: (a) a research survey, (b) a
fundamental for designing the structure/architecture of composite modular
clustering software, (c) a bibliography reference collection, and (d) a
tutorial.Comment: 102 pages, 66 figures, 67 table
Measuring Two-Event Structural Correlations on Graphs
Real-life graphs usually have various kinds of events happening on them,
e.g., product purchases in online social networks and intrusion alerts in
computer networks. The occurrences of events on the same graph could be
correlated, exhibiting either attraction or repulsion. Such structural
correlations can reveal important relationships between different events.
Unfortunately, correlation relationships on graph structures are not well
studied and cannot be captured by traditional measures. In this work, we design
a novel measure for assessing two-event structural correlations on graphs.
Given the occurrences of two events, we choose uniformly a sample of "reference
nodes" from the vicinity of all event nodes and employ the Kendall's tau rank
correlation measure to compute the average concordance of event density
changes. Significance can be efficiently assessed by tau's nice property of
being asymptotically normal under the null hypothesis. In order to compute the
measure in large scale networks, we develop a scalable framework using
different sampling strategies. The complexity of these strategies is analyzed.
Experiments on real graph datasets with both synthetic and real events
demonstrate that the proposed framework is not only efficacious, but also
efficient and scalable.Comment: VLDB201
Mining Attribute-structure Correlated Patterns in Large Attributed Graphs
In this work, we study the correlation between attribute sets and the
occurrence of dense subgraphs in large attributed graphs, a task we call
structural correlation pattern mining. A structural correlation pattern is a
dense subgraph induced by a particular attribute set. Existing methods are not
able to extract relevant knowledge regarding how vertex attributes interact
with dense subgraphs. Structural correlation pattern mining combines aspects of
frequent itemset and quasi-clique mining problems. We propose statistical
significance measures that compare the structural correlation of attribute sets
against their expected values using null models. Moreover, we evaluate the
interestingness of structural correlation patterns in terms of size and
density. An efficient algorithm that combines search and pruning strategies in
the identification of the most relevant structural correlation patterns is
presented. We apply our method for the analysis of three real-world attributed
graphs: a collaboration, a music, and a citation network, verifying that it
provides valuable knowledge in a feasible time.Comment: VLDB201
Deep Learning on Graphs: A Survey
Deep learning has been shown to be successful in a number of domains, ranging
from acoustics, images, to natural language processing. However, applying deep
learning to the ubiquitous graph data is non-trivial because of the unique
characteristics of graphs. Recently, substantial research efforts have been
devoted to applying deep learning methods to graphs, resulting in beneficial
advances in graph analysis techniques. In this survey, we comprehensively
review the different types of deep learning methods on graphs. We divide the
existing methods into five categories based on their model architectures and
training strategies: graph recurrent neural networks, graph convolutional
networks, graph autoencoders, graph reinforcement learning, and graph
adversarial methods. We then provide a comprehensive overview of these methods
in a systematic manner mainly by following their development history. We also
analyze the differences and compositions of different methods. Finally, we
briefly outline the applications in which they have been used and discuss
potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages,
11 figure
Graph Embedding Techniques, Applications, and Performance: A Survey
Graphs, such as social networks, word co-occurrence networks, and
communication networks, occur naturally in various real-world applications.
Analyzing them yields insight into the structure of society, language, and
different patterns of communication. Many approaches have been proposed to
perform the analysis. Recently, methods which use the representation of graph
nodes in vector space have gained traction from the research community. In this
survey, we provide a comprehensive and structured analysis of various graph
embedding techniques proposed in the literature. We first introduce the
embedding task and its challenges such as scalability, choice of
dimensionality, and features to be preserved, and their possible solutions. We
then present three categories of approaches based on factorization methods,
random walks, and deep learning, with examples of representative algorithms in
each category and analysis of their performance on various tasks. We evaluate
these state-of-the-art methods on a few common datasets and compare their
performance against one another. Our analysis concludes by suggesting some
potential applications and future directions. We finally present the
open-source Python library we developed, named GEM (Graph Embedding Methods,
available at https://github.com/palash1992/GEM), which provides all presented
algorithms within a unified interface to foster and facilitate research on the
topic.Comment: Submitted to Knowledge Based Systems for revie
SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint
Domains such as scientific workflows and business processes exhibit data
models with complex relationships between objects. This relationship is
typically represented as sequences, where each data item is annotated with
multi-dimensional attributes. There is a need to analyze this data for
operational insights. For example, in business processes, users are interested
in clustering process traces into smaller subsets to discover less complex
process models. This requires expensive computation of similarity metrics
between sequence-based data. Related work on dimension reduction and embedding
methods do not take into account the multi-dimensional attributes of data, and
do not address the interpretability of data in the embedding space (i.e., by
favoring vector-based representation). In this work, we introduce Summarized, a
framework for efficient analysis on sequence-based multi-dimensional data using
intuitive and user-controlled summarizations. We introduce summarization
schemes that provide tunable trade-offs between the quality and efficiency of
analysis tasks and derive an error model for summary-based similarity under an
edit-distance constraint. Evaluations using real-world datasets show the
effectives of our framework
GROUPS-NET: Group Meetings Aware Routing in Multi-Hop D2D Networks
In the next generation cellular networks, device-to-device (D2D)
communication is already considered a fundamental feature. A problem of
multi-hop D2D networks is on how to define forwarding algorithms that achieve,
at the same time, high delivery ratio and low network overhead. In this paper
we aim to understand group meetings' properties by looking at their structure
and regularity with the final goal of applying such knowledge in the design of
a forwarding algorithm for D2D multi-hop networks. We introduce a forwarding
protocol, namely GROUPS-NET, which is aware of social group meetings and their
evolution over time. Our algorithm is parameter-calibration free and does not
require any knowledge about the social network structure of the system. In
particular, different from the state of the art algorithms, GROUPS-NET does not
need communities detection, which is a complex and expensive task. We validate
our algorithm using different publicly available data-sources. In real large
scale scenarios, our algorithm achieves approximately the same delivery ratio
of the state-of-art solution with up to 40% less network overhead.Comment: arXiv admin note: text overlap with arXiv:1512.0482
Node Embedding with Adaptive Similarities for Scalable Learning over Graphs
Node embedding is the task of extracting informative and descriptive features
over the nodes of a graph. The importance of node embeddings for graph
analytics, as well as learning tasks such as node classification, link
prediction and community detection, has led to increased interest on the
problem leading to a number of recent advances. Much like PCA in the feature
domain, node embedding is an inherently \emph{unsupervised} task; in lack of
metadata used for validation, practical methods may require standardization and
limiting the use of tunable hyperparameters. Finally, node embedding methods
are faced with maintaining scalability in the face of large-scale real-world
graphs of ever-increasing sizes. In the present work, we propose an adaptive
node embedding framework that adjusts the embedding process to a given
underlying graph, in a fully unsupervised manner. To achieve this, we adopt the
notion of a tunable node similarity matrix that assigns weights on paths of
different length. The design of the multilength similarities ensures that the
resulting embeddings also inherit interpretable spectral properties. The
proposed model is carefully studied, interpreted, and numerically evaluated
using stochastic block models. Moreover, an algorithmic scheme is proposed for
training the model parameters effieciently and in an unsupervised manner. We
perform extensive node classification, link prediction, and clustering
experiments on many real world graphs from various domains, and compare with
state-of-the-art scalable and unsupervised node embedding alternatives. The
proposed method enjoys superior performance in many cases, while also yielding
interpretable information on the underlying structure of the graph
An Automated System for Discovering Neighborhood Patterns in Ego Networks
Generally, social network analysis has often focused on the topology of the
network without considering the characteristics of individuals involved in
them. Less attention is given to study the behavior of individuals, considering
they are the basic entity of a graph. Given a mobile social network graph, what
are good features to extract key information from the nodes? How many distinct
neighborhood patterns exist for ego nodes? What clues does such information
provide to study nodes over a long period of time?
In this report, we develop an automated system in order to discover the
occurrences of prototypical ego-centric patterns from data. We aim to provide a
data-driven instrument to be used in behavioral sciences for graph
interpretations. We analyze social networks derived from real-world data
collected with smart-phones. We select 13 well-known network measures,
especially those concerned with ego graphs. We form eight feature subsets and
then assess their performance using unsupervised clustering techniques to
discover distinguishing ego-centric patterns. From clustering analysis, we
discover that eight distinct neighborhood patterns have emerged. This
categorization allows concise analysis of users' data as they change over time.
The results provide a fine-grained analysis for the contribution of different
feature sets to detect unique clustering patterns. Last, as a case study, two
datasets are studied over long periods to demonstrate the utility of this
method. The study shows the effectiveness of the proposed approach in
discovering important trends from data
- …