Search CORE

133 research outputs found

Learning constructions from bilingual exposure:Computational studies of argument structure acquisition

Author: Matusevych Yevgen
Publication venue: LOT Netherlands Graduate School of Linguistics
Publication date: 01/01/2016
Field of study

Subject of Public Opinion: Theoretical and Methodical Aspects of Determination

Author: Matusevych Volodymyr
Publication venue: 'Kiev Institute of Business and Technology LLC'
Publication date: 08/04/2010
Field of study

The article presents theoretical and methodical grounds for identification of the subject of public opinion. The author finds out that functional features of public opinion determine the features of subjects too. These features tell about the subject range, structure, how it is organized, how it exerts influence on human behavior and activity of the social institutions which have the status of public opinion object

SSOAR - Social Science Open Access Repository

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Author: Goldwater Sharon
Kamper Herman
Matusevych Yevgen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2021
Field of study

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs. In a word discrimination task on six target languages, all of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision. When using only a few training languages, the multilingual CAE performs better, but with more training languages the other multilingual models perform similarly. Using more training languages is generally beneficial, but improvements are marginal on some languages. We present probing experiments which show that the CAE encodes more phonetic, word duration, language identity and speaker information than the other multilingual models.Comment: 11 pages, 7 figures, 8 tables. arXiv admin note: text overlap with arXiv:2002.02109. Submitted to the IEEE Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive

Edinburgh Research Explorer

Quantifying cross-linguistic influence with a computational model: A study of case-marking comprehension

Author: Alishahi Afra
Backus Ad
Matusevych Yevgen
Publication venue: 'John Benjamins Publishing Company'
Publication date: 23/05/2017
Field of study

Cross-linguistic influence (CLI) is one of the key phenomena in bilingual and second language learning. We propose a method for quantifying CLI in the use of linguistic constructions with the help of a computational model, which acquires constructions in two languages from bilingual input. We focus on the acquisition of case-marking cues in Russian and German and simulate two experiments that employ a picture-choice task tapping into the mechanisms of sentence interpretation. Our model yields behavioral patterns similar to human, and these patterns can be explained by the amount of CLI: the negative CLI in high amounts leads to the misinterpretation of participant roles in Russian and German object-verb-subject sentences. Finally, we make two novel predictions about the acquisition of case-marking cues in Russian and German. Most importantly, our simulations suggest that the high degree of positive CLI may facilitate the interpretation of object-verb-subject sentences

Crossref

Edinburgh Research Explorer

Tilburg University Repository

Multilingual Acoustic Word Embedding Models for Processing Zero-Resource Languages

Author: Goldwater Sharon
Kamper Herman
Matusevych Yevgen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/02/2020
Field of study

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in "zero-resource" speech search, indexing and discovery systems. Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. For this transfer learning approach, we consider two multilingual recurrent neural network models: a discriminative classifier trained on the joint vocabularies of all training languages, and a correspondence autoencoder trained to reconstruct word pairs. We test these using a word discrimination task on six target zero-resource languages. When trained on seven well-resourced languages, both models perform similarly and outperform unsupervised models trained on the zero-resource languages. With just a single training language, the second model works better, but performance depends more on the particular training--testing language pair.Comment: 5 pages, 4 figures, 1 table; accepted to ICASSP 2020. arXiv admin note: text overlap with arXiv:1811.0040

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation

Author: Jacobs Christiaan
Kamper Herman
Matusevych Yevgen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/03/2021
Field of study

Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length speech segments. For zero-resource languages where labelled data is not available, one AWE approach is to use unsupervised autoencoder-based recurrent models. Another recent approach is to use multilingual transfer: a supervised AWE model is trained on several well-resourced languages and then applied to an unseen zero-resource language. We consider how a recent contrastive learning loss can be used in both the purely unsupervised and multilingual transfer settings. Firstly, we show that terms from an unsupervised term discovery system can be used for contrastive self-supervision, resulting in improvements over previous unsupervised monolingual AWE models. Secondly, we consider how multilingual AWE models can be adapted to a specific zero-resource language using discovered terms. We find that self-supervised contrastive adaptation outperforms adapted multilingual correspondence autoencoder and Siamese AWE models, giving the best overall results in a word discrimination task on six zero-resource languages.Comment: Accepted to SLT 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

Analyzing Autoencoder-Based Acoustic Word Embeddings

Author: Goldwater Sharon
Kamper Herman
Matusevych Yevgen
Publication venue
Publication date: 03/04/2020
Field of study

Recent studies have introduced methods for learning acoustic word embeddings (AWEs)---fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about words' absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between words' embeddings increases with those words' phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory.Comment: 6 pages, 7 figures, accepted to BAICS workshop (ICLR2020

arXiv.org e-Print Archive

Edinburgh Research Explorer

Cumulative frequency can explain cognate facilitation in language models

Author: Matusevych Yevgen
Pickering Martin J.
Winther Irene
Publication venue
Publication date: 01/01/2021
Field of study

Edinburgh Research Explorer