11,740 research outputs found
Graph-based Event Extraction from Twitter
International audienceEvent detection on Twitter has become an attractive and challenging research field due to the popularity and the peculiarities of tweets. Detecting which tweets describe a specific event and clustering them is one of the main challenging tasks related to Social Media currently addressed in the NLP community. Existing approaches have mainly focused on detecting spikes in clusters around specific keywords or Named Entities (NE). However, one of the main drawbacks of such approaches is the difficulty in understanding when the same keywords describe different events. In this paper, we propose a novel approach that exploits NE mentions in tweets and their entity context to create a temporal event graph. Then, using simple graph theory techniques and a PageRank-like algorithm, we process the event graphs to detect clusters of tweets describing the same events. Experiments on two gold standard datasets show that our approach achieves state-of-the-art results both in terms of evaluation performances and the quality of the detected events
Empirical Methodology for Crowdsourcing Ground Truth
The process of gathering ground truth data through human annotation is a
major bottleneck in the use of information extraction methods for populating
the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the
attempt to solve the issues related to volume of data and lack of annotators.
Typically these practices use inter-annotator agreement as a measure of
quality. However, in many domains, such as event detection, there is ambiguity
in the data, as well as a multitude of perspectives of the information
examples. We present an empirically derived methodology for efficiently
gathering of ground truth data in a diverse set of use cases covering a variety
of domains and annotation tasks. Central to our approach is the use of
CrowdTruth metrics that capture inter-annotator disagreement. We show that
measuring disagreement is essential for acquiring a high quality ground truth.
We achieve this by comparing the quality of the data aggregated with CrowdTruth
metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical
Relation Extraction, Twitter Event Identification, News Event Extraction and
Sound Interpretation. We also show that an increased number of crowd workers
leads to growth and stabilization in the quality of annotations, going against
the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa
Recommended from our members
Detecting Important Life Events on Twitter Using Frequent Semantic and Syntactic Subgraphs
Identifying global events from social media has been the focus of much research in recent years. However, the identification of personal life events poses new requirements and challenges that have received relatively little research attention. In this paper we explore a new approach for life event identification, where we expand social media posts into both semantic, and syntactic networks of content. Frequent graph patterns are mined from these networks and used as features to enrich life-event classifiers. Results show that our approach significantly outperforms the best performing baseline in accuracy (by 4.48% points) and F-measure (by 4.54% points) when used to identify five major life events identified from the psychology literature: Getting Married, Having Children, Death of a Parent, Starting School, and Falling in Love. In addition, our results show that, while semantic graphs are effective at discriminating the theme of the post (e.g. the topic of marriage), syntactic graphs help identify whether the post describes a personal event (e.g. someone getting married)
- …