Search CORE

32 research outputs found

Representing Low-Resource Languages and Dialects: Improved Neural Methods for Spoken Language Processing

Author: Bartelds Martijn
Publication venue: University of Groningen
Publication date: 01/01/2023
Field of study

Languages are fundamental to human communication and serve as a means to express social and cultural values. However, many people treat languages as homogeneous entities, disregarding the fact that they are often composed of multiple varieties. These language varieties may be tied to certain geographical locations or the cultural identity of the speakers.Studying language variation can thus provide valuable insights into how language varieties relate to their linguistic communities. Most language varieties do not correspond to administrative boundaries, such as provinces or states within nations, and neighboring varieties often transition gradually.In this dissertation, we presented a new method to describe and model linguistic diversity. Specifically, we leveraged deep learning or artificial neural network models to quantify differences between the pronunciations of speakers from different language varieties. This new method assesses the differences between language varieties more accurately and efficiently compared to previously-used methods.Additionally, we investigated the use of these neural network models to develop speech technology to help empower language varieties. We developed an audio-based search algorithm that can automatically identify occurrences of a spoken search term in a large collection of spoken materials, improving access to resources that would normally require manual annotation. Furthermore, we presented approaches to improve speech recognition performance for several language varieties from different language families. This technology could, for example, be used to generate subtitles for videos or television broadcasts. This can be a promising step towards the important goal of developing speech technology that is inclusive of the world’s languages

Proceedings - University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Improving Cross-domain Authorship Attribution by Combining Lexical and Syntactic Features:Notebook for PAN at CLEF 2019

Author: Bartelds Martijn
de Vries Wietse
Publication venue
Publication date: 23/07/2019
Field of study

Dissertations of the University of Groningen

A New Acoustic-Based Pronunciation Distance Measure

Author: Bartelds Martijn
Liberman Mark
Richter Caitlin
Wieling Martijn
Publication venue: 'Frontiers Media SA'
Publication date: 29/05/2020
Field of study

We present an acoustic distance measure for comparing pronunciations, and apply the measure to assess foreign accent strength in American-English by comparing speech of non-native American-English speakers to a collection of native American-English speakers. An acoustic-only measure is valuable as it does not require the time-consuming and error-prone process of phonetically transcribing speech samples which is necessary for current edit distance-based approaches. We minimize speaker variability in the data set by employing speaker-based cepstral mean and variance normalization, and compute word-based acoustic distances using the dynamic time warping algorithm. Our results indicate a strong correlation of r = −0.71 (p < 0.0001) between the acoustic distances and human judgments of native-likeness provided by more than 1,100 native American-English raters. Therefore, the convenient acoustic measure performs only slightly lower than the state-of-the-art transcription-based performance of r = −0.77. We also report the results of several small experiments which show that the acoustic measure is not only sensitive to segmental differences, but also to intonational differences and durational differences. However, it is not immune to unwanted differences caused by using a different recording device

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Adapting Monolingual Models:Data can be Scarce when Language Similarity is High

Author: Bartelds Martijn
Nissim Malvina
Vries Wietse de
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2021
Field of study

For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this process. We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently fine-tuned on a POS-tagging task in the model's source language. By combining the new lexical layers and fine-tuned Transformer layers, we achieve high task performance for both target languages. With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance. Monolingual BERT-based models generally achieve higher downstream task performance after retraining the lexical layer than multilingual BERT, even when the target language is included in the multilingual model

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Measuring foreign accent strength using an acoustic distance measure

Author: Bartelds Martijn
Liberman Mark
Richter Caitlin
Vries de, Wietse
Wieling Martijn
Publication venue: Haskins Press
Publication date: 01/01/2021
Field of study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Measuring foreign accent strength using an acoustic distance measure

Author: Bartelds Martijn
Liberman Mark
Richter Caitlin
Vries de, Wietse
Wieling Martijn
Publication venue: Haskins Press
Publication date: 01/01/2021
Field of study

ARTS repository - University of Groningen

Measuring foreign accent strength using an acoustic distance measure

Author: Bartelds Martijn
Liberman Mark
Richter Caitlin
Vries de, Wietse
Wieling Martijn
Publication venue: Haskins Press
Publication date: 01/01/2021
Field of study

University of Groningen

Measuring foreign accent strength using an acoustic distance measure

Author: Bartelds Martijn
Liberman Mark
Richter Caitlin
Vries de, Wietse
Wieling Martijn
Publication venue: Haskins Press
Publication date: 01/01/2021
Field of study

Dissertations of the University of Groningen

Neural representations for modeling variation in speech

Author: Bartelds Martijn
de Vries Wietse
Liberman Mark
Richter Caitlin
Sanal Faraz
Wieling Martijn
Publication venue: 'Elsevier BV'
Publication date: 26/01/2022
Field of study

Variation in speech is often quantified by comparing phonetic transcriptions of the same utterance. However, manually transcribing speech is time-consuming and error prone. As an alternative, therefore, we investigate the extraction of acoustic embeddings from several self-supervised neural models. We use these representations to compute word-based pronunciation differences between non-native and native speakers of English, and between Norwegian dialect speakers. For comparison with several earlier studies, we evaluate how well these differences match human perception by comparing them with available human judgements of similarity. We show that speech representations extracted from a specific type of neural model (i.e. Transformers) lead to a better match with human perception than two earlier approaches on the basis of phonetic transcriptions and MFCC-based acoustic features. We furthermore find that features from the neural models can generally best be extracted from one of the middle hidden layers than from the final layer. We also demonstrate that neural speech representations not only capture segmental differences, but also intonational and durational differences that cannot adequately be represented by a set of discrete symbols used in phonetic transcriptions.Comment: Submitted to Journal of Phonetic

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen