Search CORE

402 research outputs found

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

Author: Beelen Kaspar
Kamps Jaap
Marx Maarten
Olieman Alex
van Lange Milan
Publication venue
Publication date: 01/01/2017
Field of study

Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "right" EL system. We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II. The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general.Comment: Accepted for presentation at SEMANTiCS '1

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Searching Spontaneous Conversational Speech

Author: Jong Franciska de
Oard Douglas W.
Ordelman Roeland
Raaijmakers Stephan
Publication venue: ACM SIGIR
Publication date: 01/01/2007
Field of study

The ACM SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam.\ud The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions

University of Twente Research Information

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

Author: Beelen K.
Kamps J.
Marx M.
Olieman A.
van Lange M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Overview of graph search and beyond

Author: Alonso O.
Hearst M.A.
Kamps J.
Publication venue: CEUR-WS
Publication date: 01/01/2015
Field of study

International Migration, Integration and Social Cohesion online publications

Word-to-Word Models of Translational Equivalence

Author: Melamed I. Dan
Publication venue
Publication date: 01/01/1997
Field of study

Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is noisy. This article presents methods for biasing statistical translation models to reflect these properties. Analysis of the expected behavior of these biases in the presence of sparse data predicts that they will result in more accurate models. The prediction is confirmed by evaluation with respect to a gold standard -- translation models that are biased in this fashion are significantly more accurate than a baseline knowledge-poor model. This article also shows how a statistical translation model can take advantage of various kinds of pre-existing knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, is shown to reliably boost translation model performance on some tasks. Statistical models that are informed by pre-existing knowledge about the model domain combine the best of both the rationalist and empiricist traditions

arXiv.org e-Print Archive

CiteSeerX

Machine translation evaluation resources and methods: a survey

Author: Han Lifeng
Publication venue
Publication date: 08/11/2018
Field of study

We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

DCU Online Research Access Service

Translation and the Internet : evaluating the Quality of Free Online Machine Translators

Author: Hampshire S. F
Porta Salvia Carmen
Publication venue
Publication date: 01/01/2010
Field of study

The late 1990s saw the advent of free online machine translators such as Babelfish, Google Translate and Transtext. Professional opinion regarding the quality of the translations provided by them, oscillates wildly from the «laughably bad» (Ali, 2007) to «a tremendous success» (Yang and Lange, 1998). While the literature on commercial machine translators is vast, there are only a handful of studies, mostly in blog format, that evaluate and rank free online machine translators. This paper offers a review of the most significant contributions in that field with an emphasis on two key issues: (i) the need for a ranking system; (ii) the results of a ranking system devised by the authors of this paper. Our small-scale evaluation of the performance of ten free machine translators (FMTs) in «league table» format shows what a user can expect from an individual FMT in terms of translation quality. Our rankings are a first tentative step towards allowing the user to make an informed choice as to the most appropriate FMT for his/her source text and thus produce higher FMT target text quality.Durant la darrera dècada del segle xx s'introdueixen els traductors online gratuïts (TOG), com poden ser Babelfish, Google Translate o Transtext. L'opinió per part de la crítica professional sobre aquestes traduccions es mou des d'una ingrata ridiculització (Ali, 2007) a l'acceptació més incondicional (Yang y Lange, 1998). Actualment, els estudis valoratius sobre els TOG són realment escassos, la majoria en format blog, mentre que la literatura sobre els traductors comercials és enorme. L'article que plantegem aporta una revisió de les principals contribucions i posa l'èmfasi bàsicament en dues qüestions: (i) necessitat d'un sistema de classificació (un rànquing) i (ii) descripció dels resultats obtinguts pel sistema de classificació ideat pels autors d'aquest article. L'avaluació que realitzem a petita escala es basa en l'anàlisi de l'actuació de deu TOG en un rànquing que posa de manifest les expectatives que en termes de qualitat de traducció pot esperar l'usuari. El resultat del rànquing ofereix a l'usuari els criteris que millor s'ajusten a cada cas, per tal d'utilitzar un traductor o un altre en funció del text original, i obtenir com a resultat una traducció de qualitat considerable

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Revistes Catalanes amb Accés Obert

Diposit Digital de Documents de la UAB

Secretaría de Estado de Cultura