Search CORE

227 research outputs found

自然言語符号化における言語横断的な転移学習、圧縮、白色化の効果の分析

Author: Sasaki Shota
Publication venue
Publication date: 26/09/2022
Field of study

Tohoku University博士（情報科学）thesi

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Comparative Analysis of Word Embeddings for Capturing Word Similarities

Author: Kalajdjieski Jovan
Stojanovska Frosina
Toshevska Martina
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 07/05/2020
Field of study

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.Comment: Part of the 6th International Conference on Natural Language Processing (NATP 2020

arXiv.org e-Print Archive

Crossref

Lightweight Adaptation of Neural Language Models via Subspace Embedding

Author: Jaiswal Amit Kumar
Liu Haiming
Publication venue
Publication date: 16/08/2023
Field of study

Traditional neural word embeddings are usually dependent on a richer diversity of vocabulary. However, the language models recline to cover major vocabularies via the word embedding parameters, in particular, for multilingual language models that generally cover a significant part of their overall learning parameters. In this work, we present a new compact embedding structure to reduce the memory footprint of the pre-trained language models with a sacrifice of up to 4% absolute accuracy. The embeddings vectors reconstruction follows a set of subspace embeddings and an assignment procedure via the contextual relationship among tokens from pre-trained language models. The subspace embedding structure calibrates to masked language models, to evaluate our compact embedding structure on similarity and textual entailment tasks, sentence and paraphrase tasks. Our experimental evaluation shows that the subspace embeddings achieve compression rates beyond 99.8% in comparison with the original embeddings for the language models on XNLI and GLUE benchmark suites.Comment: 5 pages, Accepted as a Main Conference Short Paper at CIKM 202

arXiv.org e-Print Archive

自然言語符号化における言語横断的な転移学習、圧縮、白色化の効果の分析

Author: Sasaki Shota
Publication venue
Publication date: 26/09/2022
Field of study

othe

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Subproduct systems and Cartesian systems; new results on factorial languages and their relations with other areas

Author: Gerhold Malte
Skeide Michael
Publication venue: 'Louisiana State University Libraries'
Publication date: 01/01/2020
Field of study

We point out that a sequence of natural numbers is the dimension sequence of a subproduct system if and only if it is the cardinality sequence of a word system (or factorial language). Determining such sequences is, therefore, reduced to a purely combinatorial problem in the combinatorics of words. A corresponding (and equivalent) result for graded algebras has been known in abstract algebra, but this connection with pure combinatorics has not yet been noticed by the product systems community. We also introduce Cartesian systems, which can be seen either as a set theoretic version of subproduct systems or an abstract version of word systems. Applying this, we provide several new results on the cardinality sequences of word systems and the dimension sequences of subproduct systems.Comment: New title; added references; to appear in Journal of Stochastic Analysi

arXiv.org e-Print Archive

Università degli Studi del Molise: IRIS

Louisiana State University