Search CORE

67 research outputs found

Flexible NLP Pipelines for Digital Humanities Research

Author: Smink Wouter A.C.
Sools Anneke M.
van der Zwaan Janneke M.
Veldkamp Bernard P.
Westerhof Gerben J.
Wiegersma Sytske
Publication venue
Publication date: 01/08/2017
Field of study

University of Twente Research Information

Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

Author
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

No abstract available

Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

Author
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

No abstract available

Enlighten

Mining Social Science Publications for Survey Variables

Author: Mutschke Peter
Zielinski Andrea
Publication venue: 'Anatomische Gesellschaft'
Publication date: 01/01/2017
Field of study

Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding modest improvements over the baseline

SSOAR - Social Science Open Access Repository

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Author: Andoni A.
Beyer K.
Broder A. Z.
Brown P. F.
Fried D.
Le Q.
Mikolov T.
Mu Y.
Muja M.
Petrović S.
Riezler S.
Salton G.
Wang J.
Weber R.
Yang L.
Yao X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2016
Field of study

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

arXiv.org e-Print Archive

Crossref

Scipedia

Reltextrank: An open source framework for building relational syntactic-semantic text pair representations

Author: Alessandro Moschitti
Aliaksei Severyn
Kateryna Tymoshenko
Massimo Nicosia
Publication venue
Publication date: 01/01/2017
Field of study

Crossref

Open Access Repository

Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications

Author: Mutschke Peter
Zielinski Andrea
Publication venue: 'Deutsche Zeitschrift Fur Sportmedizin/German Journal of Sports Medicine'
Publication date: 01/01/2018
Field of study

In this paper, we describe our effort to create a new corpus for the evaluation of detecting and linking so-called survey variables in social science publications (e.g., "Do you believe in Heaven?"). The task is to recognize survey variable mentions in a given text, disambiguate them, and link them to the corresponding variable within a knowledge base. Since there are generally hundreds of candidates to link to and due to the wide variety of forms they can take, this is a challenging task within NLP. The contribution of our work is the first gold standard corpus for the variable detection and linking task. We describe the annotation guidelines and the annotation process. The produced corpus is multilingual - German and English - and includes manually curated word and phrase alignments. Moreover, it includes text samples that could not be assigned to any variables, denoted as negative examples. Based on the new dataset, we conduct an evaluation of several state-of-the-art text classification and textual similarity methods. The annotated corpus is made available along with an open-source baseline system for variable mention identification and linking

SSOAR - Social Science Open Access Repository

multi level alignments as an extensible representation basis for textual entailment algorithms

Author: Ido Dagan
Kathrin Eichler
Lili Kotlerman
Meni Adler
Sebastian Padó
Tae-Gil Noh
Vered Shwartz
Vivi Nastase
Publication venue
Publication date: 01/01/2015
Field of study

A major problem in research on Textual Entailment (TE) is the high implementation effort for TE systems. Recently, interoperable standards for annotation and preprocessing have been proposed. In contrast, the algorithmic level remains unstandardized, which makes component re-use in this area very difficult in practice. In this paper, we introduce multi-level alignments as a central, powerful representation for TE algorithms that encourages modular, reusable, multilingual algorithm development. We demonstrate that a pilot open-source implementation of multi-level alignment with minimal features competes with state-of-theart open-source TE engines in three languages

Archivio della ricerca - Fondazione Bruno Kessler

Open Access Repository