25,043 research outputs found
Extracting user spatio-temporal profiles from location based social networks
Report de RecercaLocation Based Social Networks (LBSN) like Twitter or Instagram are a good source for user spatio-temporal behavior. These social network provide a low rate sampling of user's location information during large intervals of time that can be used to discover complex behaviors, including mobility profiles, points of interest or unusual events. This information is important for different domains like mobility route planning, touristic recommendation systems or city planning.
Other approaches have used the data from LSBN to categorize areas of a city depending on the categories of the places that people visit or to discover user behavioral patterns from their visits. The aim of this paper is to analyze how the spatio-temporal behavior of a large number of users in a well limited geographical area can be segmented in different profiles. These behavioral profiles are obtained by means of clustering algorithms that show the different behaviors that people have when living and visiting a city.
The data analyzed was obtained from the public data feeds of Twitter and Instagram inside the area of the city of Barcelona for a period of several months. The analysis of these data shows that these kind of algorithms can be successfully applied to data from any city (or any general area) to discover useful profiles that can be described on terms of the city singular places and areas and their temporal relationships. These profiles can be used as a basis for making decisions in different application domains, specially those related with mobility inside and outside a city.Preprin
FlashProfile: A Framework for Synthesizing Data Profiles
We address the problem of learning a syntactic profile for a collection of
strings, i.e. a set of regex-like patterns that succinctly describe the
syntactic variations in the strings. Real-world datasets, typically curated
from multiple sources, often contain data in various syntactic formats. Thus,
any data processing task is preceded by the critical step of data format
identification. However, manual inspection of data to identify the different
formats is infeasible in standard big-data scenarios.
Prior techniques are restricted to a small set of pre-defined patterns (e.g.
digits, letters, words, etc.), and provide no control over granularity of
profiles. We define syntactic profiling as a problem of clustering strings
based on syntactic similarity, followed by identifying patterns that succinctly
describe each cluster. We present a technique for synthesizing such profiles
over a given language of patterns, that also allows for interactive refinement
by requesting a desired number of clusters.
Using a state-of-the-art inductive synthesis framework, PROSE, we have
implemented our technique as FlashProfile. Across tasks over large
real datasets, we observe a median profiling time of only s.
Furthermore, we show that access to syntactic profiles may allow for more
accurate synthesis of programs, i.e. using fewer examples, in
programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201
Interests Diffusion in Social Networks
Understanding cultural phenomena on Social Networks (SNs) and exploiting the
implicit knowledge about their members is attracting the interest of different
research communities both from the academic and the business side. The
community of complexity science is devoting significant efforts to define laws,
models, and theories, which, based on acquired knowledge, are able to predict
future observations (e.g. success of a product). In the mean time, the semantic
web community aims at engineering a new generation of advanced services by
defining constructs, models and methods, adding a semantic layer to SNs. In
this context, a leapfrog is expected to come from a hybrid approach merging the
disciplines above. Along this line, this work focuses on the propagation of
individual interests in social networks. The proposed framework consists of the
following main components: a method to gather information about the members of
the social networks; methods to perform some semantic analysis of the Domain of
Interest; a procedure to infer members' interests; and an interests evolution
theory to predict how the interests propagate in the network. As a result, one
achieves an analytic tool to measure individual features, such as members'
susceptibilities and authorities. Although the approach applies to any type of
social network, here it is has been tested against the computer science
research community.
The DBLP (Digital Bibliography and Library Project) database has been elected
as test-case since it provides the most comprehensive list of scientific
production in this field.Comment: 30 pages 13 figs 4 table
Tweeting your Destiny: Profiling Users in the Twitter Landscape around an Online Game
Social media has become a major communication channel for communities
centered around video games. Consequently, social media offers a rich data
source to study online communities and the discussions evolving around games.
Towards this end, we explore a large-scale dataset consisting of over 1 million
tweets related to the online multiplayer shooter Destiny and spanning a time
period of about 14 months using unsupervised clustering and topic modelling.
Furthermore, we correlate Twitter activity of over 3,000 players with their
playtime. Our results contribute to the understanding of online player
communities by identifying distinct player groups with respect to their Twitter
characteristics, describing subgroups within the Destiny community, and
uncovering broad topics of community interest.Comment: Accepted at IEEE Conference on Games 201
Recommended from our members
A Bayesian mixture model for term re-occurrence and burstiness
This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using
a mixture of exponential distributions. Parameter
estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and withindocument burstiness. The model works for all kinds of terms, be it rare content
word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Detecting Real-World Influence Through Twitter
In this paper, we investigate the issue of detecting the real-life influence
of people based on their Twitter account. We propose an overview of common
Twitter features used to characterize such accounts and their activity, and
show that these are inefficient in this context. In particular, retweets and
followers numbers, and Klout score are not relevant to our analysis. We thus
propose several Machine Learning approaches based on Natural Language
Processing and Social Network Analysis to label Twitter users as Influencers or
not. We also rank them according to a predicted influence level. Our proposals
are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art
ranking methods.Comment: 2nd European Network Intelligence Conference (ENIC), Sep 2015,
Karlskrona, Swede
- …