14,380 research outputs found
Scalable Modeling of Conversational-role based Self-presentation Characteristics in Large Online Forums
Online discussion forums are complex webs of overlapping subcommunities
(macrolevel structure, across threads) in which users enact different roles
depending on which subcommunity they are participating in within a particular
time point (microlevel structure, within threads). This sub-network structure
is implicit in massive collections of threads. To uncover this structure, we
develop a scalable algorithm based on stochastic variational inference and
leverage topic models (LDA) along with mixed membership stochastic block (MMSB)
models. We evaluate our model on three large-scale datasets,
Cancer-ThreadStarter (22K users and 14.4K threads), Cancer-NameMention(15.1K
users and 12.4K threads) and StackOverFlow (1.19 million users and 4.55 million
threads). Qualitatively, we demonstrate that our model can provide useful
explanations of microlevel and macrolevel user presentation characteristics in
different communities using the topics discovered from posts. Quantitatively,
we show that our model does better than MMSB and LDA in predicting user reply
structure within threads. In addition, we demonstrate via synthetic data
experiments that the proposed active sub-network discovery model is stable and
recovers the original parameters of the experimental setup with high
probability
Link Prediction in Social Networks: the State-of-the-Art
In social networks, link prediction predicts missing links in current
networks and new or dissolution links in future networks, is important for
mining and analyzing the evolution of social networks. In the past decade, many
works have been done about the link prediction in social networks. The goal of
this paper is to comprehensively review, analyze and discuss the
state-of-the-art of the link prediction in social networks. A systematical
category for link prediction techniques and problems is presented. Then link
prediction techniques and problems are analyzed and discussed. Typical
applications of link prediction are also addressed. Achievements and roadmaps
of some active research groups are introduced. Finally, some future challenges
of the link prediction in social networks are discussed.Comment: 38 pages, 13 figures, Science China: Information Science, 201
Graph Embedding Techniques, Applications, and Performance: A Survey
Graphs, such as social networks, word co-occurrence networks, and
communication networks, occur naturally in various real-world applications.
Analyzing them yields insight into the structure of society, language, and
different patterns of communication. Many approaches have been proposed to
perform the analysis. Recently, methods which use the representation of graph
nodes in vector space have gained traction from the research community. In this
survey, we provide a comprehensive and structured analysis of various graph
embedding techniques proposed in the literature. We first introduce the
embedding task and its challenges such as scalability, choice of
dimensionality, and features to be preserved, and their possible solutions. We
then present three categories of approaches based on factorization methods,
random walks, and deep learning, with examples of representative algorithms in
each category and analysis of their performance on various tasks. We evaluate
these state-of-the-art methods on a few common datasets and compare their
performance against one another. Our analysis concludes by suggesting some
potential applications and future directions. We finally present the
open-source Python library we developed, named GEM (Graph Embedding Methods,
available at https://github.com/palash1992/GEM), which provides all presented
algorithms within a unified interface to foster and facilitate research on the
topic.Comment: Submitted to Knowledge Based Systems for revie
A Survey of Heterogeneous Information Network Analysis
Most real systems consist of a large number of interacting, multi-typed
components, while most contemporary researches model them as homogeneous
networks, without distinguishing different types of objects and links in the
networks. Recently, more and more researchers begin to consider these
interconnected, multi-typed data as heterogeneous information networks, and
develop structural analysis approaches by leveraging the rich semantic meaning
of structural types of objects and links in the networks. Compared to widely
studied homogeneous network, the heterogeneous information network contains
richer structure and semantic information, which provides plenty of
opportunities as well as a lot of challenges for data mining. In this paper, we
provide a survey of heterogeneous information network analysis. We will
introduce basic concepts of heterogeneous information network analysis, examine
its developments on different data mining tasks, discuss some advanced topics,
and point out some future research directions.Comment: 45 pages, 12 figure
Deep Representation Learning for Social Network Analysis
Social network analysis is an important problem in data mining. A fundamental
step for analyzing social networks is to encode network data into
low-dimensional representations, i.e., network embeddings, so that the network
topology structure and other attribute information can be effectively
preserved. Network representation leaning facilitates further applications such
as classification, link prediction, anomaly detection and clustering. In
addition, techniques based on deep neural networks have attracted great
interests over the past a few years. In this survey, we conduct a comprehensive
review of current literature in network representation learning utilizing
neural network models. First, we introduce the basic models for learning node
representations in homogeneous networks. Meanwhile, we will also introduce some
extensions of the base models in tackling more complex scenarios, such as
analyzing attributed networks, heterogeneous networks and dynamic networks.
Then, we introduce the techniques for embedding subgraphs. After that, we
present the applications of network representation learning. At the end, we
discuss some promising research directions for future work
Dynamic Node Embeddings from Edge Streams
Networks evolve continuously over time with the addition, deletion, and
changing of links and nodes. Such temporal networks (or edge streams) consist
of a sequence of timestamped edges and are seemingly ubiquitous. Despite the
importance of accurately modeling the temporal information, most embedding
methods ignore it entirely or approximate the temporal network using a sequence
of static snapshot graphs. In this work, we propose using the notion of
temporal walks for learning dynamic embeddings from temporal networks. Temporal
walks capture the temporally valid interactions (e.g., flow of information,
spread of disease) in the dynamic network in a lossless fashion. Based on the
notion of temporal walks, we describe a general class of embeddings called
continuous-time dynamic network embeddings (CTDNEs) that completely avoid the
issues and problems that arise when approximating the temporal network as a
sequence of static snapshot graphs. Unlike previous work, CTDNEs learn dynamic
node embeddings directly from the temporal network at the finest temporal
granularity and thus use only temporally valid information. As such CTDNEs
naturally support online learning of the node embeddings in a streaming
real-time fashion. Finally, the experiments demonstrate the effectiveness of
this class of embedding methods that leverage temporal walks as it achieves an
average gain in AUC of 11.9% across all methods and graphs.Comment: IEEE Transactions on Emerging Topics in Computational Intelligence
(TETIC
Network Representation Learning: Consolidation and Renewed Bearing
Graphs are a natural abstraction for many problems where nodes represent
entities and edges represent a relationship across entities. An important area
of research that has emerged over the last decade is the use of graphs as a
vehicle for non-linear dimensionality reduction in a manner akin to previous
efforts based on manifold learning with uses for downstream database
processing, machine learning and visualization. In this systematic yet
comprehensive experimental survey, we benchmark several popular network
representation learning methods operating on two key tasks: link prediction and
node classification. We examine the performance of 12 unsupervised embedding
methods on 15 datasets. To the best of our knowledge, the scale of our study --
both in terms of the number of methods and number of datasets -- is the largest
to date.
Our results reveal several key insights about work-to-date in this space.
First, we find that certain baseline methods (task-specific heuristics, as well
as classic manifold methods) that have often been dismissed or are not
considered by previous efforts can compete on certain types of datasets if they
are tuned appropriately. Second, we find that recent methods based on matrix
factorization offer a small but relatively consistent advantage over
alternative methods (e.g., random-walk based methods) from a qualitative
standpoint. Specifically, we find that MNMF, a community preserving embedding
method, is the most competitive method for the link prediction task. While
NetMF is the most competitive baseline for node classification. Third, no
single method completely outperforms other embedding methods on both node
classification and link prediction tasks. We also present several drill-down
analysis that reveals settings under which certain algorithms perform well
(e.g., the role of neighborhood context on performance) -- guiding the
end-user
COSINE: Compressive Network Embedding on Large-scale Information Networks
There is recently a surge in approaches that learn low-dimensional embeddings
of nodes in networks. As there are many large-scale real-world networks, it's
inefficient for existing approaches to store amounts of parameters in memory
and update them edge after edge. With the knowledge that nodes having similar
neighborhood will be close to each other in embedding space, we propose COSINE
(COmpresSIve NE) algorithm which reduces the memory footprint and accelerates
the training process by parameters sharing among similar nodes. COSINE applies
graph partitioning algorithms to networks and builds parameter sharing
dependency of nodes based on the result of partitioning. With parameters
sharing among similar nodes, COSINE injects prior knowledge about higher
structural information into training process which makes network embedding more
efficient and effective. COSINE can be applied to any embedding lookup method
and learn high-quality embeddings with limited memory and shorter training
time. We conduct experiments of multi-label classification and link prediction,
where baselines and our model have the same memory usage. Experimental results
show that COSINE gives baselines up to 23% increase on classification and up to
25% increase on link prediction. Moreover, time of all representation learning
methods using COSINE decreases from 30% to 70%
Review on Graph Feature Learning and Feature Extraction Techniques for Link Prediction
The problem of link prediction has recently attracted considerable attention
by research community. Given a graph, which is an abstraction of the
relationships among entities, the task of link prediction is to anticipate
future connections among entities in the graph, concerning its current state.
Extensive studies have examined this problem from different aspects and
proposed various methods, some of which might work very well for a specific
application but not as a global solution. This work presents an extensive
review of state-of-art methods and algorithms proposed on this subject and
categorizes them into four main categories: similarity-based methods,
probabilistic methods, relational models, and learning-based methods.
Additionally, a collection of network data sets has been presented in this
paper, which can be used to study link prediction. To the best of our
knowledge, this survey is the first comprehensive study that considers all of
the mentioned challenges and solutions for link prediction in graphs with the
improvements in the recent years, including the unsupervised and supervised
techniques and their evolution over the recent years.Comment: 31 pages, 7 figure
GrAMME: Semi-Supervised Learning using Multi-layered Graph Attention Models
Modern data analysis pipelines are becoming increasingly complex due to the
presence of multi-view information sources. While graphs are effective in
modeling complex relationships, in many scenarios a single graph is rarely
sufficient to succinctly represent all interactions, and hence multi-layered
graphs have become popular. Though this leads to richer representations,
extending solutions from the single-graph case is not straightforward.
Consequently, there is a strong need for novel solutions to solve classical
problems, such as node classification, in the multi-layered case. In this
paper, we consider the problem of semi-supervised learning with multi-layered
graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for
community discovery, we argue that feature learning with random node
attributes, using graph neural networks, can be more effective. To this end, we
propose to use attention models for effective feature learning, and develop two
novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer
dependencies for building multi-layered graph embeddings. Using empirical
studies on several benchmark datasets, we evaluate the proposed approaches and
demonstrate significant performance improvements in comparison to
state-of-the-art network embedding strategies. The results also show that using
simple random features is an effective choice, even in cases where explicit
node attributes are not available
- …