27 research outputs found
Correlation-based Intrinsic Evaluation of Word Vector Representations
We introduce QVEC-CCA--an intrinsic evaluation metric for word vector
representations based on correlations of learned vectors with features
extracted from linguistic resources. We show that QVEC-CCA scores are an
effective proxy for a range of extrinsic semantic and syntactic tasks. We also
show that the proposed evaluation obtains higher and more consistent
correlations with downstream tasks, compared to existing approaches to
intrinsic evaluation of word vectors that are based on word similarity.Comment: RepEval 2016, 5 page
Lexical Semantic Recognition
In lexical semantics, full-sentence segmentation and segment labeling of
various phenomena are generally treated separately, despite their
interdependence. We hypothesize that a unified lexical semantic recognition
task is an effective way to encapsulate previously disparate styles of
annotation, including multiword expression identification / classification and
supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence
tagger and evaluate its performance along various axes of annotation. As the
label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally
evaluate how well the model generalizes to those test sets, finding that it
approaches or surpasses existing models despite training only on STREUSLE. Our
work also establishes baseline models and evaluation metrics for integrated and
accurate modeling of lexical semantics, facilitating future work in this area.Comment: 11 pages, 3 figures; to appear at MWE 202
Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish
Named Entity Recognition (NER) has greatly advanced by the introduction of
deep neural architectures. However, the success of these methods depends on
large amounts of training data. The scarcity of publicly-available
human-labeled datasets has resulted in limited evaluation of existing NER
systems, as is the case for Danish. This paper studies the effectiveness of
cross-lingual transfer for Danish, evaluates its complementarity to limited
gold data, and sheds light on performance of Danish NER.Comment: Published at NoDaLiDa 2019; updated (system, data and repository
details
Fra begrebsordbog til sprogteknologisk ressource: verber, semantiske roller og rammer – et pilotstudie
This paper describes a method of compiling a lexicon of Danish semantic frames within the model of the Berkeley FrameNet (BFN). Large groups of near-synonymous verbs and verbal nouns, including multiword units, within the domains of communication and cognition are identified and extracted from the source manuscript of a newly published Danish the-saurus. Each word or expression is then assigned an appropriate frame from BFN. The fact that words within the same domain all belong to a manageable subset of frames in BFN makes is possible to map a high number of words to their corresponding frames simultaneously. In a forthcoming annotation project where words within the same two do-mains are already identified in the corpus, the idea is to pre-annotate with the frames in our lexicon, leaving afterwards human annotators to con-firm the frame and test whether it is possible to identify the BFN seman-tic roles described for English in the Danish text. Our method reveals some interesting divergences between the semantic divisions established in the thesaurus in contrast to the ones found in BFN, showing that the two resources contribute with different types of linguistic information and thereby constitute a useful supplement to one another