Search CORE

1,677 research outputs found

Empirical Methodology for Crowdsourcing Ground Truth

Author: Aroyo Lora
Dumitrache Anca
Inel Oana
Ortiz Carlos
Sips Robert-Jan
Timmermans Benjamin
Welty Chris
Publication venue
Publication date: 24/09/2018
Field of study

The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa

arXiv.org e-Print Archive

Understanding Events:A Diversity-driven Human-Machine Approach

Author: Inel Oana
Publication venue
Publication date: 09/03/2022
Field of study

VU Research Portal

Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

Author: Biega Asia J.
Roy Rishiraj Saha
Schmidt Jana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape

arXiv.org e-Print Archive

MPG.PuRe

Recommended from our members

Language engineering - a champion for European culture

Author: Banus E.
Diver J.
Elio B.
Simpkins N.
Publication venue
Publication date: 01/10/1996
Field of study

Language is key to culture. It is a direct cultural medium as well as a means of recording and providing access to non-lingual elements of culture. Language is also fundamental to a sense of cultural identity. For this reason, it is vital, in a changing Europe, that we preserve the multi-lingual character of our society in order to move successfully towards closer co-operation at a political, economic, and social level. Language engineering is the application of knowledge of language to the development of computer software which can recognise, understand, interpret, and generate human language in all its forms. The paper provides a high level view of the ‘state of the art’ in language engineering and indicates ways in which it will have a profound impact on our culture in the future. It shows how advances in language engineering are an important aid in maintaining cultural diversity in a multi-lingual European society, while enabling the development of social cohesion across cultural and national divides. It addresses issues raised by the prospect of the Multi-lingual Information Society, including education, human communication with technology and information management, as well as aspects of digital cities such as tele-presence in digital libraries, virtual art galleries and electronic museums. The paper raises the issue of language as a factor in cultural domination, showing the contribution that language engineering can make towards countering it. The paper also raises a number of controversial issues concerning the likely benefits arising from the ways in which language is likely to influence the culture of Europe

Open Research Online (The Open University)

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Author: Anantharam Pramod
Anantharam Pramod
Balasuriya Lakshika
Ferrucci David
Kimmig Angelika
McMahon Connor
Meng Lingling
Perera Sujan
Sheth Amit
Wijeratne Sanjaya
Wijeratne Sanjaya
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

Big data and the SP theory of intelligence

Author: Wolff J. Gerard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This article is about how the "SP theory of intelligence" and its realisation in the "SP machine" may, with advantage, be applied to the management and analysis of big data. The SP system -- introduced in the article and fully described elsewhere -- may help to overcome the problem of variety in big data: it has potential as "a universal framework for the representation and processing of diverse kinds of knowledge" (UFK), helping to reduce the diversity of formalisms and formats for knowledge and the different ways in which they are processed. It has strengths in the unsupervised learning or discovery of structure in data, in pattern recognition, in the parsing and production of natural language, in several kinds of reasoning, and more. It lends itself to the analysis of streaming data, helping to overcome the problem of velocity in big data. Central in the workings of the system is lossless compression of information: making big data smaller and reducing problems of storage and management. There is potential for substantial economies in the transmission of data, for big cuts in the use of energy in computing, for faster processing, and for smaller and lighter computers. The system provides a handle on the problem of veracity in big data, with potential to assist in the management of errors and uncertainties in data. It lends itself to the visualisation of knowledge structures and inferential processes. A high-parallel, open-source version of the SP machine would provide a means for researchers everywhere to explore what can be done with the system and to create new versions of it.Comment: Accepted for publication in IEEE Acces

arXiv.org e-Print Archive

CiteSeerX