206 research outputs found
Toward Geo-social Information Systems: Methods and Algorithms
The widespread adoption of GPS-enabled tagging of social media content via
smartphones and social media services (e.g., Facebook, Twitter, Foursquare) uncovers
a new window into the spatio-temporal activities of hundreds of millions of people.
These \footprints" open new possibilities for understanding how people can organize
for societal impact and lay the foundation for new crowd-powered geo-social systems.
However, there are key challenges to delivering on this promise: the slow adoption
of location sharing, the inherent bias in the users that do share location, imbalanced
location granularity, respecting location privacy, among many others. With these
challenges in mind, this dissertation aims to develop the framework, algorithms, and
methods for a new class of geo-social information systems. The dissertation is structured
in two main parts: the rst focuses on understanding the capacity of existing
footprints; the second demonstrates the potential of new geo-social information systems
through two concrete prototypes.
First, we investigate the capacity of using these geo-social footprints to build new
geo-social information systems. (i): we propose and evaluate a probabilistic framework
for estimating a microblog user's location based purely on the content of the
user's posts. With the help of a classi cation component for automatically identifying
words in tweets with a strong local geo-scope, the location estimator places 51%
of Twitter users within 100 miles of their actual location. (ii): we investigate a set of
22 million check-ins across 220,000 users and report a quantitative assessment of human
mobility patterns by analyzing the spatial, temporal, social, and textual aspects
associated with these footprints. Concretely, we observe that users follow simple reproducible
mobility patterns. (iii): we compare a set of 35 million publicly shared check-ins with a set of over 400 million private query logs recorded by a commercial
hotel search engine. Although generated by users with fundamentally di erent intentions,
we nd common conclusions may be drawn from both data sources, indicating
the viability of publicly shared location information to complement (and replace, in
some cases), privately held location information.
Second, we introduce a couple of prototypes of new geo-social information systems
that utilize the collective intelligence from the emerging geo-social footprints.
Concretely, we propose an activity-driven search system, and a local expert nding
system that both take advantage of the collective intelligence. Speci cally, we study
location-based activity patterns revealed through location sharing services and nd
that these activity patterns can identify semantically related locations, and help with
both unsupervised location clustering, and supervised location categorization with a
high con dence. Based on these results, we show how activity-driven semantic organization
of locations may be naturally incorporated into location-based web search.
In addition, we propose a local expert nding system that identi es top local experts
for a topic in a location. Concretely, the system utilizes semantic labels that people
label each other, people's locations in current location-based social networks, and can
identify top local experts with a high precision. We also observe that the proposed
local authority metrics that utilize collective intelligence from expert candidates' core
audience (list labelers), signi cantly improve the performance of local experts nding
than the more intuitive way that only considers candidates' locations.
ii
Recommended from our members
Exploiting Social Media Sources for Search, Fusion and Evaluation
The web contains heterogeneous information that is generated with different characteristics and is presented via different media. Social media, as one of the largest content carriers, has generated information from millions of users worldwide, creating material rapidly in all types of forms such as comments, images, tags, videos and ratings, etc. In social applications, the formation of online communities contributes to conversations of substantially broader aspects, as well as unfiltered opinions about subjects that are rarely covered in public media. Information accrued on social platforms, therefore, presents a unique opportunity to augment web sources such as Wikipedia or news pages, which are usually characterized as being more formal. The goal of this dissertation is to investigate in depth how social data can be exploited and applied in the context of three fundamental information retrieval (IR) tasks: search, fusion, and evaluation. Improving search performance has consistently been a major focus in the IR community. Given the in-depth discussions and active interactions contained in social media, we present approaches to incorporating this type of data to improve search on general web corpora. In particular, we propose two graph-based frameworks, social anchor and information network, to associate related web and social content, where information sources of diverse characteristics can be used to complement each other in a unified manner. We investigate how the enriched representation can potentially reduce vocabulary mismatch and improve retrieval effectiveness. Presenting social media content to users is valuable particularly for queries intended for time-sensitive events or community opinions. Current major search engines commonly blend results from different search services (or verticals) into core web results. Motivated by this real-world need, we explore ways to merge results from different web and social services into a single ranked list. We present an optimization framework for fusion, where impact of documents, ranked lists, and verticals can be modeled simultaneously to maximize performance. Evaluating search system performance has largely relied on creating reusable test collections in IR. Traditional ways to creating evaluation sets can require substantial manual effort. To reduce such effort, we explore an approach to automating the process of collecting pairs of queries and relevance judgments, using high quality social media, Community Question Answering (CQA). Our approach is based on the idea that CQA services support platforms for users to raise questions and to share answers, therefore encoding the associations between real user information needs and real user assessments. To demonstrate the effectiveness of our approaches, we conduct extensive retrieval and fusion experiments, as well as verify the reliability of the new, CQA-based evaluation test sets
Leveraging Social Media and Web of Data for Crisis Response Coordination
There is an ever increasing number of users in social media (1B+ Facebook users, 500M+ Twitter users) and ubiquitous mobile access (6B+ mobile phone subscribers) who share their observations and opinions. In addition, the Web of Data and existing knowledge bases keep on growing at a rapid pace. In this scenario, we have unprecedented opportunities to improve crisis response by extracting social signals, creating spatio-temporal mappings, performing analytics on social and Web of Data, and supporting a variety of applications. Such applications can help provide situational awareness during an emergency, improve preparedness, and assist during the rebuilding/recovery phase of a disaster. Data mining can provide valuable insights to support emergency responders and other stakeholders during crisis. However, there are a number of challenges and existing computing technology may not work in all cases. Therefore, our objective here is to present the characterization of such data mining tasks, and challenges that need further research attention
CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises
Locating timely, useful information during crises and mass emergencies is critical for those forced to make potentially life-altering decisions. As the use of Twitter to broadcast useful information during such situations becomes more widespread, the problem of finding it becomes more difficult. We describe an approach toward improving the recall in the sampling of Twitter communications that can lead to greater situational awareness during crisis situations. First, we create a lexicon of crisis-related terms that frequently appear in relevant messages posted during different types of crisis situations. Next, we demonstrate how we use the lexicon to automatically identify new terms that describe a given crisis. Finally, we explain how to efficiently query Twitter to extract crisis-related messages during emergency events. In our experiments, using a crisis lexicon leads to substantial improvements in terms of recall when added to a set of crisis-specific keywords manually chosen by experts; it also helps to preserve the original distribution of message types
- …