372 research outputs found
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Describing and Understanding Neighborhood Characteristics through Online Social Media
Geotagged data can be used to describe regions in the world and discover
local themes. However, not all data produced within a region is necessarily
specifically descriptive of that area. To surface the content that is
characteristic for a region, we present the geographical hierarchy model (GHM),
a probabilistic model based on the assumption that data observed in a region is
a random mixture of content that pertains to different levels of a hierarchy.
We apply the GHM to a dataset of 8 million Flickr photos in order to
discriminate between content (i.e., tags) that specifically characterizes a
region (e.g., neighborhood) and content that characterizes surrounding areas or
more general themes. Knowledge of the discriminative and non-discriminative
terms used throughout the hierarchy enables us to quantify the uniqueness of a
given region and to compare similar but distant regions. Our evaluation
demonstrates that our model improves upon traditional Naive Bayes
classification by 47% and hierarchical TF-IDF by 27%. We further highlight the
differences and commonalities with human reasoning about what is locally
characteristic for a neighborhood, distilled from ten interviews and a survey
that covered themes such as time, events, and prior regional knowledgeComment: Accepted in WWW 2015, 2015, Florence, Ital
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
We propose a method for embedding two-dimensional locations in a continuous
vector space using a neural network-based model incorporating mixtures of
Gaussian distributions, presenting two model variants for text-based
geolocation and lexical dialectology. Evaluated over Twitter data, the proposed
model outperforms conventional regression-based geolocation and provides a
better estimate of uncertainty. We also show the effectiveness of the
representation for predicting words from location in lexical dialectology, and
evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP
2017) September 2017, Copenhagen, Denmar
"When and Where?": Behavior Dominant Location Forecasting with Micro-blog Streams
The proliferation of smartphones and wearable devices has increased the
availability of large amounts of geospatial streams to provide significant
automated discovery of knowledge in pervasive environments, but most prominent
information related to altering interests have not yet adequately capitalized.
In this paper, we provide a novel algorithm to exploit the dynamic fluctuations
in user's point-of-interest while forecasting the future place of visit with
fine granularity. Our proposed algorithm is based on the dynamic formation of
collective personality communities using different languages, opinions,
geographical and temporal distributions for finding out optimized equivalent
content. We performed extensive empirical experiments involving, real-time
streams derived from 0.6 million stream tuples of micro-blog comprising 1945
social person fusion with graph algorithm and feed-forward neural network model
as a predictive classification model. Lastly, The framework achieves 62.10%
mean average precision on 1,20,000 embeddings on unlabeled users and
surprisingly 85.92% increment on the state-of-the-art approach.Comment: Accepted as a full paper in the 2nd International Workshop on Social
Computing co-located with ICDM, 2018 Singapor
Recommended from our members
GIS Investigation of Crime Prediction with an Operationalized Tweet Corpus
Social media as the de facto communication channel is being used to disseminate one’s diurnal self-revelations. This profound discovery often contains double-talk, peculiar insights, or contextual information about real-world events. Natural language processing is regularly used to uncover both obvious and latent knowledge claims within disclosures published amid the complex environment. For example, a perpetrator with first-hand knowledge of their criminal incident uses social media to post critical information about it. A geographic information system (GIS) is capable of large-scale point data analysis and possesses methods that enable dataset processing, evaluation, and automatic spatial visualization. Such an artifact—fused with traditional environmental criminology theory and social media—erects guidelines, tools, and models for substantive construction and evaluation of GIS crime analysis solutions. Provided the social media stream is timely and correctly processed, corrective action can be taken. The construction of a natural language processing social media annotation pipe identifies latent indicators extracted from a social media corpus and is an integral part of societal mishap prediction. Spatial visualizations and regression analyses were used to describe and evaluate project artifacts. As a result, a social media corpus was operationalized, and subsequently used as a proxy for a traditional environmental criminology risk layer in construction of a social media GIS crime analysis artifact. Using such multi-domain collaboration, the artifact was able to increase the predictive crime incident outcome with an overall R-squared increase of 21.94%. This result is the state-of-the-art; there are no other results to compare it to
- …