Search CORE

898 research outputs found

Extracting semantic entities and events from sports tweets

Author: Breslin John G.
Choudhury Smitashree
Publication venue
Publication date: 01/01/2011
Field of study

Large volumes of user-generated content on practically every major issue and event are being created on the microblogging site Twitter. This content can be combined and processed to detect events, entities and popular moods to feed various knowledge-intensive practical applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. In this paper, we exploit various approaches to detect the named entities and significant micro-events from users’ tweets during a live sports event. Here we describe how combining linguistic features with background knowledge and the use of Twitter-specific features can achieve high, precise detection results (f-measure = 87%) in different datasets. A study was conducted on tweets from cricket matches in the ICC World Cup in order to augment the event-related non-textual media with collective intelligence

CiteSeerX

Open Research Online (The Open University)

スタンス分類における領域知識の活用に関する研究

Author: Sasaki Akira
Publication venue
Publication date: 31/08/2018
Field of study

Tohoku University乾健太郎課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Empirical Methodology for Crowdsourcing Ground Truth

Author: Aroyo Lora
Dumitrache Anca
Inel Oana
Ortiz Carlos
Sips Robert-Jan
Timmermans Benjamin
Welty Chris
Publication venue
Publication date: 24/09/2018
Field of study

The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa

arXiv.org e-Print Archive

TripleSent: a triple store of events associated with their prototypical sentiment

Author: Desmet Bart
Hoste Veronique
Lefever Els
van der Waart van Gulik Stephan
Publication venue: IARIA
Publication date: 01/01/2016
Field of study

The current generation of sentiment analysis systems is limited in their real-world applicability because they cannot detect utterances that implicitly carry positive or negative sentiment. We present early stage research ideas to address this inability with the development of a dynamic triple store of events associated with their prototypical sentiment

Ghent University Academic Bibliography

Archivsystem Ask23

Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use

Author: Adda Gilles
Couillault Alain
Fort Karen
Mariani Joseph
Sagot Benoît
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/07/2014
Field of study

International audienceThis article is a position paper about Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years. According to the mainstream opinion expressed in articles of the domain, this type of on-line working platforms allows to develop quickly all sorts of quality language resources, at a very low price, by people doing that as a hobby. We shall demonstrate here that the situation is far from being that ideal. Our goal here is manifold: 1- to inform researchers, so that they can make their own choices, 2- to develop alternatives with the help of funding agencies and scientific associations, 3- to propose practical and organizational solutions in order to improve language resources development, while limiting the risks of ethical and legal issues without letting go price or quality, 4- to introduce an Ethics and Big Data Charter for the documentation of language resourc

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1