Search CORE

30 research outputs found

Learning constructions from bilingual exposure:Computational studies of argument structure acquisition

Author: Matusevych Yevgen
Publication venue: LOT Netherlands Graduate School of Linguistics
Publication date: 01/01/2016
Field of study

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Author: Goldwater Sharon
Kamper Herman
Matusevych Yevgen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2021
Field of study

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs. In a word discrimination task on six target languages, all of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision. When using only a few training languages, the multilingual CAE performs better, but with more training languages the other multilingual models perform similarly. Using more training languages is generally beneficial, but improvements are marginal on some languages. We present probing experiments which show that the CAE encodes more phonetic, word duration, language identity and speaker information than the other multilingual models.Comment: 11 pages, 7 figures, 8 tables. arXiv admin note: text overlap with arXiv:2002.02109. Submitted to the IEEE Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive

Edinburgh Research Explorer

Quantifying cross-linguistic influence with a computational model: A study of case-marking comprehension

Author: Alishahi Afra
Backus Ad
Matusevych Yevgen
Publication venue: 'John Benjamins Publishing Company'
Publication date: 23/05/2017
Field of study

Cross-linguistic influence (CLI) is one of the key phenomena in bilingual and second language learning. We propose a method for quantifying CLI in the use of linguistic constructions with the help of a computational model, which acquires constructions in two languages from bilingual input. We focus on the acquisition of case-marking cues in Russian and German and simulate two experiments that employ a picture-choice task tapping into the mechanisms of sentence interpretation. Our model yields behavioral patterns similar to human, and these patterns can be explained by the amount of CLI: the negative CLI in high amounts leads to the misinterpretation of participant roles in Russian and German object-verb-subject sentences. Finally, we make two novel predictions about the acquisition of case-marking cues in Russian and German. Most importantly, our simulations suggest that the high degree of positive CLI may facilitate the interpretation of object-verb-subject sentences

Crossref

Edinburgh Research Explorer

Tilburg University Repository

Analyzing Autoencoder-Based Acoustic Word Embeddings

Author: Goldwater Sharon
Kamper Herman
Matusevych Yevgen
Publication venue
Publication date: 03/04/2020
Field of study

Recent studies have introduced methods for learning acoustic word embeddings (AWEs)---fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about words' absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between words' embeddings increases with those words' phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory.Comment: 6 pages, 7 figures, accepted to BAICS workshop (ICLR2020

arXiv.org e-Print Archive

Edinburgh Research Explorer

Multilingual Acoustic Word Embedding Models for Processing Zero-Resource Languages

Author: Goldwater Sharon
Kamper Herman
Matusevych Yevgen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/02/2020
Field of study

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in "zero-resource" speech search, indexing and discovery systems. Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. For this transfer learning approach, we consider two multilingual recurrent neural network models: a discriminative classifier trained on the joint vocabularies of all training languages, and a correspondence autoencoder trained to reconstruct word pairs. We test these using a word discrimination task on six target zero-resource languages. When trained on seven well-resourced languages, both models perform similarly and outperform unsupervised models trained on the zero-resource languages. With just a single training language, the second model works better, but performance depends more on the particular training--testing language pair.Comment: 5 pages, 4 figures, 1 table; accepted to ICASSP 2020. arXiv admin note: text overlap with arXiv:1811.0040

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation

Author: Jacobs Christiaan
Kamper Herman
Matusevych Yevgen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/03/2021
Field of study

Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length speech segments. For zero-resource languages where labelled data is not available, one AWE approach is to use unsupervised autoencoder-based recurrent models. Another recent approach is to use multilingual transfer: a supervised AWE model is trained on several well-resourced languages and then applied to an unseen zero-resource language. We consider how a recent contrastive learning loss can be used in both the purely unsupervised and multilingual transfer settings. Firstly, we show that terms from an unsupervised term discovery system can be used for contrastive self-supervision, resulting in improvements over previous unsupervised monolingual AWE models. Secondly, we consider how multilingual AWE models can be adapted to a specific zero-resource language using discovered terms. We find that self-supervised contrastive adaptation outperforms adapted multilingual correspondence autoencoder and Siamese AWE models, giving the best overall results in a word discrimination task on six zero-resource languages.Comment: Accepted to SLT 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

Cumulative frequency can explain cognate facilitation in language models

Author: Matusevych Yevgen
Pickering Martin J.
Winther Irene
Publication venue
Publication date: 01/01/2021
Field of study

Edinburgh Research Explorer

Analyzing and modeling free word associations

Author: Matusevych Yevgen
Stevenson Suzanne
Publication venue
Publication date: 01/01/2019
Field of study

Edinburgh Research Explorer

Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection

Author: Corkery Maria
Goldwater Sharon
Matusevych Yevgen
Publication venue
Publication date: 01/01/2019
Field of study

The cognitive mechanisms needed to account for the English past tense have long been a subject of debate in linguistics and cognitive science. Neural network models were proposed early on, but were shown to have clear flaws. Recently, however, Kirov and Cotterell (2018) showed that modern encoder-decoder (ED) models overcome many of these flaws. They also presented evidence that ED models demonstrate humanlike performance in a nonce-word task. Here, we look more closely at the behaviour of their model in this task. We find that (1) the model exhibits instability across multiple simulations in terms of its correlation with human data, and (2) even when results are aggregated across simulations (treating each simulation as an individual human participant), the fit to the human data is not strong---worse than an older rule-based model. These findings hold up through several alternative training regimes and evaluation measures. Although other neural architectures might do better, we conclude that there is still insufficient evidence to claim that neural nets are a good cognitive model for this task.Comment: Accepted at ACL 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Crosslinguistic transfer as category adjustment: Modeling conceptual color shift in bilingualism

Author: Beekhuizen Barend
Matusevych Yevgen
Stevenson Suzanne
Publication venue
Publication date: 01/01/2018
Field of study

Edinburgh Research Explorer