29,888 research outputs found
Using Sensor Metadata Streams to Identify Topics of Local Events in the City
In this paper, we study the emerging Information Retrieval (IR) task of local event retrieval using sensor metadata streams. Sensor metadata streams include information such as the crowd density from video processing, audio classifications, and social media activity. We propose to use these metadata streams to identify the topics of local events within a city, where each event topic corresponds to a set of terms representing a type of events such as a concert or a protest. We develop a supervised approach that is capable of mapping sensor metadata observations to an event topic. In addition to using a variety of sensor metadata observations about the current status of the environment as learning features, our approach incorporates additional background features to model cyclic event patterns. Through experimentation with data collected from two locations in a major Spanish city, we show that our approach markedly outperforms an alternative baseline. We also show that modelling background information improves event topic identification
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time
Crowd-powered conversational assistants have been shown to be more robust
than automated systems, but do so at the cost of higher response latency and
monetary costs. A promising direction is to combine the two approaches for high
quality, low latency, and low cost solutions. In this paper, we introduce
Evorus, a crowd-powered conversational assistant built to automate itself over
time by (i) allowing new chatbots to be easily integrated to automate more
scenarios, (ii) reusing prior crowd answers, and (iii) learning to
automatically approve response candidates. Our 5-month-long deployment with 80
participants and 281 conversations shows that Evorus can automate itself
without compromising conversation quality. Crowd-AI architectures have long
been proposed as a way to reduce cost and latency for crowd-powered systems;
Evorus demonstrates how automation can be introduced successfully in a deployed
system. Its architecture allows future researchers to make further innovation
on the underlying automated components in the context of a deployed open domain
dialog system.Comment: 10 pages. To appear in the Proceedings of the Conference on Human
Factors in Computing Systems 2018 (CHI'18
- …