Search CORE

524 research outputs found

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

Author: Berend Gábor
Publication venue
Publication date: 21/12/2016
Field of study

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e.~150 sentences per language

arXiv.org e-Print Archive

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

Author: Fei Hao
Ji Donghong
Zhang Meishan
Publication venue
Publication date: 01/01/2020
Field of study

Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.Comment: Accepted at ACL 202

arXiv.org e-Print Archive

Crossref

Semantic Tagging with Deep Residual Networks

Author: Bjerva Johannes
Bos Johan
Plank Barbara
Publication venue
Publication date: 31/10/2016
Field of study

We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets). Our tagger uses both word and character representations and includes a novel residual bypass architecture. We evaluate the tagset both intrinsically on the new task of semantic tagging, as well as on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an auxiliary loss function predicting our semantic tags, significantly outperforms prior results on English Universal Dependencies POS tagging (95.71% accuracy on UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

VBN

Dissertations of the University of Groningen

Ridge Regression, Hubness, and Zero-Shot Learning

Author: Hara Kazuo
Matsumoto Yuji
Shigeto Yutaro
Shimbo Masashi
Suzuki Ikumi
Publication venue
Publication date: 03/07/2015
Field of study

This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neighbor search step. Assuming a simple data model, we prove that the proposed approach indeed reduces hubness. This was verified empirically on the tasks of bilingual lexicon extraction and image labeling: hubness was reduced with both of these tasks and the accuracy was improved accordingly.Comment: To be presented at ECML/PKDD 201

arXiv.org e-Print Archive

Crossref

Language classification from bilingual word embedding graphs

Author: Eger Steffen
Hoenen Armin
Mehler Alexander
Publication venue
Publication date: 10/10/2016
Field of study

We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks. We find strongly and weakly positive correlations between down-stream task performance and second language similarity to the target language. Additionally, we show how bilingual word embeddings can be employed for the task of semantic language classification and that joint semantic spaces vary in meaningful ways across second languages. Our results support the hypothesis that semantic language similarity is influenced by both structural similarity as well as geography/contact.Comment: To be published at Coling 201

arXiv.org e-Print Archive

TUbiblio