5,721 research outputs found
Link Prediction in Social Networks: the State-of-the-Art
In social networks, link prediction predicts missing links in current
networks and new or dissolution links in future networks, is important for
mining and analyzing the evolution of social networks. In the past decade, many
works have been done about the link prediction in social networks. The goal of
this paper is to comprehensively review, analyze and discuss the
state-of-the-art of the link prediction in social networks. A systematical
category for link prediction techniques and problems is presented. Then link
prediction techniques and problems are analyzed and discussed. Typical
applications of link prediction are also addressed. Achievements and roadmaps
of some active research groups are introduced. Finally, some future challenges
of the link prediction in social networks are discussed.Comment: 38 pages, 13 figures, Science China: Information Science, 201
Predicting Anchor Links between Heterogeneous Social Networks
People usually get involved in multiple social networks to enjoy new services
or to fulfill their needs. Many new social networks try to attract users of
other existing networks to increase the number of their users. Once a user
(called source user) of a social network (called source network) joins a new
social network (called target network), a new inter-network link (called anchor
link) is formed between the source and target networks. In this paper, we
concentrated on predicting the formation of such anchor links between
heterogeneous social networks. Unlike conventional link prediction problems in
which the formation of a link between two existing users within a single
network is predicted, in anchor link prediction, the target user is missing and
will be added to the target network once the anchor link is created. To solve
this problem, we use meta-paths as a powerful tool for utilizing heterogeneous
information in both the source and target networks. To this end, we propose an
effective general meta-path-based approach called Connector and Recursive
Meta-Paths (CRMP). By using those two different categories of meta-paths, we
model different aspects of social factors that may affect a source user to join
the target network, resulting in the formation of a new anchor link. Extensive
experiments on real-world heterogeneous social networks demonstrate the
effectiveness of the proposed method against the recent methods.Comment: To be published in "Proceedings of the 2016 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining (ASONAM)
Reciprocal versus Parasocial Relationships in Online Social Networks
Many online social networks are fundamentally directed, i.e., they consist of
both reciprocal edges (i.e., edges that have already been linked back) and
parasocial edges (i.e., edges that haven't been linked back). Thus,
understanding the structures and evolutions of reciprocal edges and parasocial
ones, exploring the factors that influence parasocial edges to become
reciprocal ones, and predicting whether a parasocial edge will turn into a
reciprocal one are basic research problems.
However, there have been few systematic studies about such problems. In this
paper, we bridge this gap using a novel large-scale Google+ dataset crawled by
ourselves as well as one publicly available social network dataset. First, we
compare the structures and evolutions of reciprocal edges and those of
parasocial edges. For instance, we find that reciprocal edges are more likely
to connect users with similar degrees while parasocial edges are more likely to
link ordinary users (e.g., users with low degrees) and popular users (e.g.,
celebrities). However, the impacts of reciprocal edges linking ordinary and
popular users on the network structures increase slowly as the social networks
evolve. Second, we observe that factors including user behaviors, node
attributes, and edge attributes all have significant impacts on the formation
of reciprocal edges. Third, in contrast to previous studies that treat
reciprocal edge prediction as either a supervised or a semi-supervised learning
problem, we identify that reciprocal edge prediction is better modeled as an
outlier detection problem. Finally, we perform extensive evaluations with the
two datasets, and we show that our proposal outperforms previous reciprocal
edge prediction approaches.Comment: Social Network Analysis and Mining, Springer, 201
Anxious Depression Prediction in Real-time Social Data
Mental well-being and social media have been closely related domains of
study. In this research a novel model, AD prediction model, for anxious
depression prediction in real-time tweets is proposed. This mixed
anxiety-depressive disorder is a predominantly associated with erratic thought
process, restlessness and sleeplessness. Based on the linguistic cues and user
posting patterns, the feature set is defined using a 5-tuple vector <word,
timing, frequency, sentiment, contrast>. An anxiety-related lexicon is built to
detect the presence of anxiety indicators. Time and frequency of tweet is
analyzed for irregularities and opinion polarity analytics is done to find
inconsistencies in posting behaviour. The model is trained using three
classifiers (multinomial na\"ive bayes, gradient boosting, and random forest)
and majority voting using an ensemble voting classifier is done. Preliminary
results are evaluated for tweets of sampled 100 users and the proposed model
achieves a classification accuracy of 85.09%
mvn2vec: Preservation and Collaboration in Multi-View Network Embedding
Multi-view networks are broadly present in real-world applications. In the
meantime, network embedding has emerged as an effective representation learning
approach for networked data. Therefore, we are motivated to study the problem
of multi-view network embedding with a focus on the optimization objectives
that are specific and important in embedding this type of network. In our
practice of embedding real-world multi-view networks, we explicitly identify
two such objectives, which we refer to as preservation and collaboration. The
in-depth analysis of these two objectives is discussed throughout this paper.
In addition, the mvn2vec algorithms are proposed to (i) study how varied extent
of preservation and collaboration can impact embedding learning and (ii)
explore the feasibility of achieving better embedding quality by modeling them
simultaneously. With experiments on a series of synthetic datasets, a
large-scale internal Snapchat dataset, and two public datasets, we confirm the
validity and importance of preservation and collaboration as two objectives for
multi-view network embedding. These experiments further demonstrate that better
embedding can be obtained by simultaneously modeling the two objectives, while
not over-complicating the model or requiring additional supervision. The code
and the processed datasets are available at
http://yushi2.web.engr.illinois.edu/
Learning multi-faceted representations of individuals from heterogeneous evidence using neural networks
Inferring latent attributes of people online is an important social computing
task, but requires integrating the many heterogeneous sources of information
available on the web. We propose learning individual representations of people
using neural nets to integrate rich linguistic and network evidence gathered
from social media. The algorithm is able to combine diverse cues, such as the
text a person writes, their attributes (e.g. gender, employer, education,
location) and social relations to other people. We show that by integrating
both textual and network evidence, these representations offer improved
performance at four important tasks in social media inference on Twitter:
predicting (1) gender, (2) occupation, (3) location, and (4) friendships for
users. Our approach scales to large datasets and the learned representations
can be used as general features in and have the potential to benefit a large
number of downstream tasks including link prediction, community detection, or
probabilistic reasoning over social networks
Link Prediction in Multiplex Networks based on Interlayer Similarity
Some networked systems can be better modelled by multilayer structure where
the individual nodes develop relationships in multiple layers. Multilayer
networks with similar nodes across layers are also known as multiplex networks.
This manuscript proposes a novel framework for predicting forthcoming or
missing links in multiplex networks. The link prediction problem in multiplex
networks is how to predict links in one of the layers, taking into account the
structural information of other layers. The proposed link prediction framework
is based on interlayer similarity and proximity-based features extracted from
the layer for which the link prediction is considered. To this end, commonly
used proximity-based features such as Adamic-Adar and Jaccard Coefficient are
considered. These features that have been originally proposed to predict
missing links in monolayer networks, do not require learning, and thus are
simple to compute. The proposed method introduces a systematic approach to take
into account interlayer similarity for the link prediction purpose.
Experimental results on both synthetic and real multiplex networks reveal the
effectiveness of the proposed method and show its superior performance than
state-of-the-art algorithms proposed for the link prediction problem in
multiplex networks
Supervised Rank Aggregation for Predicting Influence in Networks
Much work in Social Network Analysis has focused on the identification of the
most important actors in a social network. This has resulted in several
measures of influence and authority. While most of such sociometrics (e.g.,
PageRank) are driven by intuitions based on an actors location in a network,
asking for the "most influential" actors in itself is an ill-posed question,
unless it is put in context with a specific measurable task. Constructing a
predictive task of interest in a given domain provides a mechanism to
quantitatively compare different measures of influence. Furthermore, when we
know what type of actionable insight to gather, we need not rely on a single
network centrality measure. A combination of measures is more likely to capture
various aspects of the social network that are predictive and beneficial for
the task. Towards this end, we propose an approach to supervised rank
aggregation, driven by techniques from Social Choice Theory. We illustrate the
effectiveness of this method through experiments on Twitter and citation
networks
CONE: Community Oriented Network Embedding
Detecting communities has long been popular in the research on networks. It
is usually modeled as an unsupervised clustering problem on graphs, based on
heuristic assumptions about community characteristics, such as edge density and
node homogeneity. In this work, we doubt the universality of these widely
adopted assumptions and compare human labeled communities with machine
predicted ones obtained via various mainstream algorithms. Based on supportive
results, we argue that communities are defined by various social patterns and
unsupervised learning based on heuristics is incapable of capturing all of
them. Therefore, we propose to inject supervision into community detection
through Community Oriented Network Embedding (CONE), which leverages limited
ground-truth communities as examples to learn an embedding model aware of the
social patterns underlying them. Specifically, a deep architecture is developed
by combining recurrent neural networks with random-walks on graphs towards
capturing social patterns directed by ground-truth communities. Generic
clustering algorithms on the embeddings of other nodes produced by the learned
model then effectively reveals more communities that share similar social
patterns with the ground-truth ones.Comment: 10 pages, accepted by IJCNN 201
Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey
Topic modeling is one of the most powerful techniques in text mining for data
mining, latent data discovery, and finding relationships among data, text
documents. Researchers have published many articles in the field of topic
modeling and applied in various fields such as software engineering, political
science, medical and linguistic science, etc. There are various methods for
topic modeling, which Latent Dirichlet allocation (LDA) is one of the most
popular methods in this field. Researchers have proposed various models based
on the LDA in topic modeling. According to previous work, this paper can be
very useful and valuable for introducing LDA approaches in topic modeling. In
this paper, we investigated scholarly articles highly (between 2003 to 2016)
related to Topic Modeling based on LDA to discover the research development,
current trends and intellectual structure of topic modeling. Also, we summarize
challenges and introduce famous tools and datasets in topic modeling based on
LDA.Comment: arXiv admin note: text overlap with arXiv:1505.07302 by other author
- …