444 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Tripartite Graph Clustering for Dynamic Sentiment Analysis on Social Media
The growing popularity of social media (e.g, Twitter) allows users to easily
share information with each other and influence others by expressing their own
sentiments on various subjects. In this work, we propose an unsupervised
\emph{tri-clustering} framework, which analyzes both user-level and tweet-level
sentiments through co-clustering of a tripartite graph. A compelling feature of
the proposed framework is that the quality of sentiment clustering of tweets,
users, and features can be mutually improved by joint clustering. We further
investigate the evolution of user-level sentiments and latent feature vectors
in an online framework and devise an efficient online algorithm to sequentially
update the clustering of tweets, users and features with newly arrived data.
The online framework not only provides better quality of both dynamic
user-level and tweet-level sentiment analysis, but also improves the
computational and storage efficiency. We verified the effectiveness and
efficiency of the proposed approaches on the November 2012 California ballot
Twitter data.Comment: A short version is in Proceeding of the 2014 ACM SIGMOD International
Conference on Management of dat
Feature-rich networks: going beyond complex network topologies.
Abstract The growing availability of multirelational data gives rise to an opportunity for novel characterization of complex real-world relations, supporting the proliferation of diverse network models such as Attributed Graphs, Heterogeneous Networks, Multilayer Networks, Temporal Networks, Location-aware Networks, Knowledge Networks, Probabilistic Networks, and many other task-driven and data-driven models. In this paper, we propose an overview of these models and their main applications, described under the common denomination of Feature-rich Networks, i. e. models where the expressive power of the network topology is enhanced by exposing one or more peculiar features. The aim is also to sketch a scenario that can inspire the design of novel feature-rich network models, which in turn can support innovative methods able to exploit the full potential of mining complex network structures in domain-specific applications
JNET: Learning User Representations via Joint Network Embedding and Topic Embedding
User representation learning is vital to capture diverse user preferences,
while it is also challenging as user intents are latent and scattered among
complex and different modalities of user-generated data, thus, not directly
measurable. Inspired by the concept of user schema in social psychology, we
take a new perspective to perform user representation learning by constructing
a shared latent space to capture the dependency among different modalities of
user-generated data. Both users and topics are embedded to the same space to
encode users' social connections and text content, to facilitate joint modeling
of different modalities, via a probabilistic generative framework. We evaluated
the proposed solution on large collections of Yelp reviews and StackOverflow
discussion posts, with their associated network structures. The proposed model
outperformed several state-of-the-art topic modeling based user models with
better predictive power in unseen documents, and state-of-the-art network
embedding based user models with improved link prediction quality in unseen
nodes. The learnt user representations are also proved to be useful in content
recommendation, e.g., expert finding in StackOverflow
Combating User Misbehavior on Social Media
Social media encourages user participation and facilitates user’s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media — spamming, manipulation, and distortion.
First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection.
Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics.
Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS.
We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework.
Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media
Unveiling human interactions : approaches and techniques toward the discovery and representation of interactions in networks
L'abstract è presente nell'allegato / the abstract is in the attachmen
The role of geographic knowledge in sub-city level geolocation algorithms
Geolocation of microblog messages has been largely investigated in the lit-
erature. Many solutions have been proposed that achieve good results at the
city-level. Existing approaches are mainly data-driven (i.e., they rely on a
training phase). However, the development of algorithms for geolocation at
sub-city level is still an open problem also due to the absence of good training
datasets. In this thesis, we investigate the role that external geographic know-
ledge can play in geolocation approaches. We show how di)erent geographical
data sources can be combined with a semantic layer to achieve reasonably
accurate sub-city level geolocation. Moreover, we propose a knowledge-based
method, called Sherloc, to accurately geolocate messages at sub-city level, by
exploiting the presence in the message of toponyms possibly referring to the
speci*c places in the target geographical area. Sherloc exploits the semantics
associated with toponyms contained in gazetteers and embeds them into a
metric space that captures the semantic distance among them. This allows
toponyms to be represented as points and indexed by a spatial access method,
allowing us to identify the semantically closest terms to a microblog message,
that also form a cluster with respect to their spatial locations. In contrast to
state-of-the-art methods, Sherloc requires no prior training, it is not limited
to geolocating on a *xed spatial grid and it experimentally demonstrated its
ability to infer the location at sub-city level with higher accuracy
Combating User Misbehavior on Social Media
Social media encourages user participation and facilitates user’s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media — spamming, manipulation, and distortion.
First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection.
Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics.
Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS.
We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework.
Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media
- …