74 research outputs found
Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling
In this paper we propose and carefully evaluate a sequence labeling framework
which solely utilizes sparse indicator features derived from dense distributed
word representations. The proposed model obtains (near) state-of-the art
performance for both part-of-speech tagging and named entity recognition for a
variety of languages. Our model relies only on a few thousand sparse
coding-derived features, without applying any modification of the word
representations employed for the different tasks. The proposed model has
favorable generalization properties as it retains over 89.8% of its average POS
tagging accuracy when trained at 1.2% of the total available training data,
i.e.~150 sentences per language
Analysing the semantic content of static Hungarian embedding spaces
Word embeddings can encode semantic features and have achieved many recent successes in solving NLP tasks. Although word embeddings have high success on several downstream tasks, there is no trivial approach to extract lexical information from them. We propose a transformation that amplifies desired semantic features in the basis of the embedding space. We generate these semantic features by a distant supervised approach, to make them applicable for Hungarian embedding spaces. We propose the Hellinger distance in order to perform a transformation to an interpretable embedding space. Furthermore, we extend our research to sparse word representations as well, since sparse representations are considered to be highly interpretable
Nyelvspecifikus transzformer modellek közötti megfeleltetĂ©ssel törtĂ©nĹ‘ zero-shot jelentĂ©segyĂ©rtelműsĂtĂ©s
CikkĂĽnkben egy nyelvspecifikus transzformer modellekre támaszkodĂł, a jelentĂ©segyĂ©rtelműsĂtĂ©si feladatot zero-shot mĂłdon elvĂ©gzĹ‘ eljárást mutatunk be. A javasolt mĂłdszer a nyelvközi tudástranszfert a tanĂtĂładatokkal rendelkezĹ‘ forrás-, valamint a tanĂtĂładatokat nĂ©lkĂĽlözĹ‘ cĂ©lnyelv feldolgozására dedikáltan lĂ©trehozott egynyelvű elĹ‘tanĂtott modellekre Ă©pĂt. A nyelvek közötti kapcsolatot az egynyelvű transzformer modellek rejtett rĂ©tegei közötti megfeleltetĂ©st szolgálĂł lekĂ©pezĂ©s tanulásával Ă©rjĂĽk el. EredmĂ©nyeink megmutatják, hogy az ilyen mĂłdon lĂ©trehozott, kizárĂłlag angol nyelvű jelentĂ©segyĂ©rtelműsĂtett szövegeken tanulĂł modellek hatĂ©konysága szignifikánsan javĂthatĂł a többnyelvű maszkolt nyelvi modell alkalmazásához kĂ©pest
Utilizing word embeddings for part-of-speech tagging
In this paper, we illustrate the power of distributed word representations for the part-of-speech tagging of Hungarian texts. We trained CRF models for POS-tagging that made use of features derived from the sparse coding of the word embeddings of Hungarian words as signals. We show that relying on such a representation, it is possible to avoid the creation of language specific features for achieving reliable performance. We evaluated our models on all the subsections of the Szeged Treebank both using MSD and universal morphology tag sets. Furthermore, we also report results for inter-subcorpora experiments
Regularization of word embeddings for multi-word expression identification
In this paper we compare the effects of applying various state-of-the-art word representation strategies in the task of multi-word expression (MWE) identification. In particular, we analyze the strengths and weaknesses of the usage of `1-regularized sparse word embeddings for identifying MWEs. Our earlier study demonstrated the effectiveness of regularized word embeddings in other sequence labeling tasks, i.e. part-of-speech tagging and named entity recognition, but it has not yet been rigorously evaluated for the identification of MWEs yet
Látens szemantikus eloszlások használata a nyelvi modellek elĹ‘tanĂtása során
CikkĂĽnk egy olyan variánsát mutatja be a nyelvi modellek elĹ‘tanĂtásának, amely során a maszkolás tárgyául nem a vĂ©letlenszerűen kiválasztott tokenek rekonstruálását, hanem azok szemantikus kategĂłriájának megállapĂtását tűzzĂĽk ki cĂ©lul. A javasolt mĂłdon lĂ©trehozott modelljeink finomhangolását változatos benchmarkokon elvĂ©gezve azt találjuk, hogy azok szignifikánsan jobb eredmĂ©ny elĂ©rĂ©sĂ©re kĂ©pesek hagyományos társaikhoz kĂ©pest
Látens szemantikus eloszlások használata a nyelvi modellek elĹ‘tanĂtása során
CikkĂĽnk egy olyan variánsát mutatja be a nyelvi modellek elĹ‘tanĂtásának, amely során a maszkolás tárgyául nem a vĂ©letlenszerűen kiválasztott tokenek rekonstruálását, hanem azok szemantikus kategĂłriájának megállapĂtását tűzzĂĽk ki cĂ©lul. A javasolt mĂłdon lĂ©trehozott modelljeink finomhangolását változatos benchmarkokon elvĂ©gezve azt találjuk, hogy azok szignifikánsan jobb eredmĂ©ny elĂ©rĂ©sĂ©re kĂ©pesek hagyományos társaikhoz kĂ©pest
- …