2 research outputs found
Unsupervised Stemming based Language Model for Telugu Broadcast News Transcription
In Indian Languages , native speakers are able to understand new words formed
by either combining or modifying root words with tense and / or gender. Due to
data insufficiency, Automatic Speech Recognition system (ASR) may not
accommodate all the words in the language model irrespective of the size of the
text corpus. It also becomes computationally challenging if the volume of the
data increases exponentially due to morphological changes to the root word. In
this paper a new unsupervised method is proposed for a Indian language: Telugu,
based on the unsupervised method for Hindi, to generate the Out of Vocabulary
(OOV) words in the language model. By using techniques like smoothing and
interpolation of pre-processed data with supervised and unsupervised stemming,
different issues in language model for Indian language: Telugu has been
addressed. We observe that the smoothing techniques Witten-Bell and Kneser-Ney
perform well when compared to other techniques on pre-processed data from
supervised learning. The ASRs accuracy is improved by 0.76% and 0.94% with
supervised and unsupervised stemming respectively.Comment: first draf
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure
In recent times, BERT based transformer models have become an inseparable
part of the 'tech stack' of text processing models. Similar progress is being
observed in the speech domain with a multitude of models observing
state-of-the-art results by using audio transformer models to encode speech.
This begs the question of what are these audio transformer models learning.
Moreover, although the standard methodology is to choose the last layer
embedding for any downstream task, but is it the optimal choice? We try to
answer these questions for the two recent audio transformer models, Mockingjay
and wave2vec2.0. We compare them on a comprehensive set of language delivery
and structure features including audio, fluency and pronunciation features.
Additionally, we probe the audio models' understanding of textual surface,
syntax, and semantic features and compare them to BERT. We do this over
exhaustive settings for native, non-native, synthetic, read and spontaneous
speech dataset