Search CORE

27 research outputs found

Correlation-based Intrinsic Evaluation of Word Vector Representations

Author: Dyer Chris
Faruqui Manaal
Tsvetkov Yulia
Publication venue
Publication date: 01/01/2016
Field of study

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources. We show that QVEC-CCA scores are an effective proxy for a range of extrinsic semantic and syntactic tasks. We also show that the proposed evaluation obtains higher and more consistent correlations with downstream tasks, compared to existing approaches to intrinsic evaluation of word vectors that are based on word similarity.Comment: RepEval 2016, 5 page

arXiv.org e-Print Archive

Crossref

Supersense tagging with inter-annotator disagreement

Author: Johannsen Anders
Martínez Alonso Héctor
Plank B.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

ARTS repository - University of Groningen

From Thesaurus to Framenet

Author: Braasch Anna
Nimb Sanni
Olsen Sussi
Pedersen Bolette Sandford
Søgaard Anders
Publication venue: Lexical Computing CZ
Publication date: 01/01/2017
Field of study

Copenhagen University Research Information System

Lexical Semantic Recognition

Author: Hershcovich Daniel
Kranzlein Michael
Liu Nelson F.
Schneider Nathan
Publication venue
Publication date: 01/01/2021
Field of study

In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence. We hypothesize that a unified lexical semantic recognition task is an effective way to encapsulate previously disparate styles of annotation, including multiword expression identification / classification and supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence tagger and evaluate its performance along various axes of annotation. As the label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally evaluate how well the model generalizes to those test sets, finding that it approaches or surpasses existing models despite training only on STREUSLE. Our work also establishes baseline models and evaluation metrics for integrated and accurate modeling of lexical semantics, facilitating future work in this area.Comment: 11 pages, 3 figures; to appear at MWE 202

arXiv.org e-Print Archive

Copenhagen University Research Information System

Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish

Author: Plank Barbara
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Named Entity Recognition (NER) has greatly advanced by the introduction of deep neural architectures. However, the success of these methods depends on large amounts of training data. The scarcity of publicly-available human-labeled datasets has resulted in limited evaluation of existing NER systems, as is the case for Danish. This paper studies the effectiveness of cross-lingual transfer for Danish, evaluates its complementarity to limited gold data, and sheds light on performance of Danish NER.Comment: Published at NoDaLiDa 2019; updated (system, data and repository details

arXiv.org e-Print Archive

The IT University of Copenhagen's Repository

Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016), August 11, 2016, Berlin, Germany

Author: Friedrich Annemarie
Tomanek Katrin
Publication venue
Publication date: 01/01/2016
Field of study

OPUS Augsburg

A Corpus of Preposition Supersenses

Author: Conger Kathryn
Green Meredith
Hwang Jena D.
O'Gorman Tim
Palmer Martha
Schneider Nathan
Srikumar Vivek
Suresh Abhijit
Publication venue
Publication date: 11/08/2016
Field of study

Edinburgh Research Explorer

The Lacunae of Danish Natural Language Processing

Author: Derczynski Leon
Kirkedal Andreas Søeborg
Plank Barbara
Schluter Natalie
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Fra begrebsordbog til sprogteknologisk ressource: verber, semantiske roller og rammer – et pilotstudie

Author: Nimb Sanni
Pedersen Bolette Sandford
Publication venue: Nordisk Forening for Leksikografi
Publication date: 29/11/2018
Field of study

This paper describes a method of compiling a lexicon of Danish semantic frames within the model of the Berkeley FrameNet (BFN). Large groups of near-synonymous verbs and verbal nouns, including multiword units, within the domains of communication and cognition are identified and extracted from the source manuscript of a newly published Danish the-saurus. Each word or expression is then assigned an appropriate frame from BFN. The fact that words within the same domain all belong to a manageable subset of frames in BFN makes is possible to map a high number of words to their corresponding frames simultaneously. In a forthcoming annotation project where words within the same two do-mains are already identified in the corpus, the idea is to pre-annotate with the frames in our lexicon, leaving afterwards human annotators to con-firm the frame and test whether it is possible to identify the BFN seman-tic roles described for English in the Danish text. Our method reveals some interesting divergences between the semantic divisions established in the thesaurus in contrast to the ones found in BFN, showing that the two resources contribute with different types of linguistic information and thereby constitute a useful supplement to one another

Tidsskrift.dk (Det Kongelige Bibliotek)