Search CORE

1,617 research outputs found

Dictionary writing system (DWS) plus corpus query package (CQP): the case of TshwaneLex

Author: DE PAUW Guy
de Schryver Gilles-Maurice
Publication venue
Publication date: 01/01/2007
Field of study

In this article the integrated corpus query functionality of the dictionary compilation software TshwanelLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as Such the encouraging outcomes of this study are far-reaching

Ghent University Academic Bibliography

Towards Universal Semantic Tagging

Author: Abzianidze Lasha
Bos Johan
Publication venue
Publication date: 29/09/2017
Field of study

The paper proposes the task of universal semantic tagging---tagging word tokens with language-neutral, semantically informative tags. We argue that the task, with its independent nature, contributes to better semantic analysis for wide-coverage multilingual text. We present the initial version of the semantic tagset and show that (a) the tags provide semantically fine-grained information, and (b) they are suitable for cross-lingual semantic parsing. An application of the semantic tagging in the Parallel Meaning Bank supports both of these points as the tags contribute to formal lexical semantics and their cross-lingual projection. As a part of the application, we annotate a small corpus with the semantic tags and present new baseline result for universal semantic tagging.Comment: 9 pages, International Conference on Computational Semantics (IWCS

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Cardinal Virtues: Extracting Relation Cardinalities from Text

Author: Darari Fariz
Mirza Paramita
Razniewski Simon
Weikum Gerhard
Publication venue
Publication date: 01/01/2017
Field of study

Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations.Comment: 5 pages, ACL 2017 (short paper

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Target-Side Context for Discriminative Models in Statistical Machine Translation

Author: Bojar Ondřej
Fraser Alexander
Junczys-Dowmunt Marcin
Tamchyna Aleš
Publication venue
Publication date: 01/01/2016
Field of study

Discriminative translation models utilizing source context have been shown to help statistical machine translation performance. We propose a novel extension of this work using target context information. Surprisingly, we show that this model can be efficiently integrated directly in the decoding process. Our approach scales to large training data sizes and results in consistent improvements in translation quality on four language pairs. We also provide an analysis comparing the strengths of the baseline source-context model with our extended source-context and target-context model and we show that our extension allows us to better capture morphological coherence. Our work is freely available as part of Moses.Comment: Accepted as a long paper for ACL 201

arXiv.org e-Print Archive

Crossref

Biblio at Institute of Formal and Applied Linguistics

#Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds

Author: Bagasheva A.
Caleffi P.-M.
Cassell J.
Cook P.
Croft W.
Cunha E.
Eisenstein J.
Eisenstein J.
Giegerich H. J.
Hacken P.
Hong L.
Hu Y.
Lee C.-y.
Lerman K.
Lin Y.-R.
Lui M.
Léturgie A.
Medler D. A.
Milroy J.
Nguyen T.
Owoputi O.
Ritter A.
Ritter A.
Weng L.
Yang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2015
Field of study

Compounding of natural language units is a very common phenomena. In this paper, we show, for the first time, that Twitter hashtags which, could be considered as correlates of such linguistic units, undergo compounding. We identify reasons for this compounding and propose a prediction model that can identify with 77.07% accuracy if a pair of hashtags compounding in the near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed hashtag compounds can be recommended early, even before the compounding has taken place. Further, humans can predict compounds with an overall accuracy of only 48.7% (treated as baseline). Notably, while humans can discriminate the relatively easier cases, the automatic framework is successful in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016

arXiv.org e-Print Archive

Crossref