85 research outputs found
SPPAS: a tool for the phonetic segmentations of Speech
International audienceSPPAS is a tool to produce automatic annotations which include utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. SPPAS is distributed under the terms of the GNU Public License. It was successfully applied during the Evalita 2011 campaign, on Italian map-task dialogues. It can also deal with French, English and Chinese and there is an easy way to add other languages. The paper describes the development of resources and free tools, consisting of acoustic models, phonetic dictionaries, and libraries and programs to deal with these data. All of them are publicly available
Universal Phone Recognition with a Multilingual Allophone System
Multilingual models can improve language processing, particularly for low
resource situations, by sharing parameters across languages. Multilingual
acoustic models, however, generally ignore the difference between phonemes
(sounds that can support lexical contrasts in a particular language) and their
corresponding phones (the sounds that are actually spoken, which are language
independent). This can lead to performance degradation when combining a variety
of training languages, as identically annotated phonemes can actually
correspond to several different underlying phonetic realizations. In this work,
we propose a joint model of both language-independent phone and
language-dependent phoneme distributions. In multilingual ASR experiments over
11 languages, we find that this model improves testing performance by 2%
phoneme error rate absolute in low-resource conditions. Additionally, because
we are explicitly modeling language-independent phones, we can build a
(nearly-)universal phone recognizer that, when combined with the PHOIBLE large,
manually curated database of phone inventories, can be customized into 2,000
language dependent recognizers. Experiments on two low-resourced indigenous
languages, Inuktitut and Tusom, show that our recognizer achieves phone
accuracy improvements of more than 17%, moving a step closer to speech
recognition for all languages in the world.Comment: ICASSP 202
Inequity in Popular Voice Recognition Systems Regarding African Accents
With new age speakers such as the Echo Dot and Google Home, everyone should have equal opportunity to use them. Yet, for many popular voice recognition systems, the only accents that have wide support are those from Europe, Latin America, and Asia. This can be frustrating for users who have dialects or accents which are poorly understood by common tools like Amazon's Alexa. As such devices become more like household appliances, researchers are becoming increasingly aware of bias and inequity in Speech Recognition, as well as other sub-fields of Artificial Intelligence. The addition of African accents can potentially diversify smart speaker customer bases worldwide. My research project can help developers include accents from the African diaspora as they build these systems. In this work, we measure recognition accuracy for under-represented dialects across a variety of speech recognition systems and analyze the results in terms of standard performance metrics. After collecting audio files from different voices across the African diaspora, we discuss key findings and generate guidelines for developing an implementation for current voice recognition systems that are more fair for all
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Machine Learning (ML) algorithms are used to train computers to perform a
variety of complex tasks and improve with experience. Computers learn how to
recognize patterns, make unintended decisions, or react to a dynamic
environment. Certain trained machines may be more effective than others because
they are based on more suitable ML algorithms or because they were trained
through superior training sets. Although ML algorithms are known and publicly
released, training sets may not be reasonably ascertainable and, indeed, may be
guarded as trade secrets. While much research has been performed about the
privacy of the elements of training sets, in this paper we focus our attention
on ML classifiers and on the statistical information that can be unconsciously
or maliciously revealed from them. We show that it is possible to infer
unexpected but useful information from ML classifiers. In particular, we build
a novel meta-classifier and train it to hack other classifiers, obtaining
meaningful information about their training sets. This kind of information
leakage can be exploited, for example, by a vendor to build more effective
classifiers or to simply acquire trade secrets from a competitor's apparatus,
potentially violating its intellectual property rights
Towards Zero-shot Learning for Automatic Phonemic Transcription
Automatic phonemic transcription tools are useful for low-resource language
documentation. However, due to the lack of training sets, only a tiny fraction
of languages have phonemic transcription tools. Fortunately, multilingual
acoustic modeling provides a solution given limited audio training data. A more
challenging problem is to build phonemic transcribers for languages with zero
training data. The difficulty of this task is that phoneme inventories often
differ between the training languages and the target language, making it
infeasible to recognize unseen phonemes. In this work, we address this problem
by adopting the idea of zero-shot learning. Our model is able to recognize
unseen phonemes in the target language without any training data. In our model,
we decompose phonemes into corresponding articulatory attributes such as vowel
and consonant. Instead of predicting phonemes directly, we first predict
distributions over articulatory attributes, and then compute phoneme
distributions with a customized acoustic model. We evaluate our model by
training it using 13 languages and testing it using 7 unseen languages. We find
that it achieves 7.7% better phoneme error rate on average over a standard
multilingual model.Comment: AAAI 202
- …