679 research outputs found
Artificial Neural Network (ANN) in a Small Dataset to determine Neutrality in the Pronunciation of English as a Foreign Language in Filipino Call Center Agents
Artificial Neural Networks (ANNs) have continued to be efficient models in solving classification problems. In this paper, we explore the use of an ANN with a small dataset to accurately classify whether Filipino call center agents’ pronunciations are neutral or not based on their employer’s standards. Isolated utterances of the ten most commonly used words in the call center were recorded from eleven agents creating a dataset of 110 utterances. Two learning specialists were consulted to establish ground truths and Cohen’s Kappa was computed as 0.82, validating the reliability of the dataset. The first thirteen Mel-Frequency Cepstral Coefficients (MFCCs) were then extracted from each word and an ANN was trained with Ten-fold Stratified Cross Validation. Experimental results on the model recorded a classification accuracy of 89.60% supported by an overall F-Score of 0.92
Artificial Neural Network (ANN) in a Small Dataset to determine Neutrality in the Pronunciation of English as a Foreign Language in Filipino Call Center Agents
Artificial Neural Networks (ANNs) have continued to be efficient models in solving classification problems. In this paper, we explore the use of an A NN with a small dataset to accurately classify whet her Filipino call center agents’ pronunciations are neutral or not based on their employer’s standards. Isolated utterances of the
ten most commonly used words in the call center were recorded from eleven agents creating a dataset of
110 utterances. Two learning specialists were consulted to establish ground truths and Cohen’s Kappa was computed as 0.82, validating the reliability of the dataset. The first thirteen Mel-Frequency Cepstral Coefficients (MFCCs) were then extracted from each word and an ANN was trained with Ten-fold Stratified Cross Validation.
Experimental results on the model recorded a classification accuracy of 89.60% supported by an overall F-Score
of 0.92
Availability-Based Production Predicts Speakers' Real-time Choices of Mandarin Classifiers
Speakers often face choices as to how to structure their intended message
into an utterance. Here we investigate the influence of contextual
predictability on the encoding of linguistic content manifested by speaker
choice in a classifier language. In English, a numeral modifies a noun directly
(e.g., three computers). In classifier languages such as Mandarin Chinese, it
is obligatory to use a classifier (CL) with the numeral and the noun (e.g.,
three CL.machinery computer, three CL.general computer). While different nouns
are compatible with different specific classifiers, there is a general
classifier "ge" (CL.general) that can be used with most nouns. When the
upcoming noun is less predictable, the use of a more specific classifier would
reduce surprisal at the noun thus potentially facilitate comprehension
(predicted by Uniform Information Density, Levy & Jaeger, 2007), but the use of
that more specific classifier may be dispreferred from a production standpoint
if accessing the general classifier is always available (predicted by
Availability-Based Production; Bock, 1987; Ferreira & Dell, 2000). Here we use
a picture-naming experiment showing that Availability-Based Production predicts
speakers' real-time choices of Mandarin classifiers.Comment: To appear in proceedings of CogSci 201
From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding
The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual (x) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification
Fix it where it fails: Pronunciation learning by mining error corrections from speech logs
The pronunciation dictionary, or lexicon, is an essential component in an automatic speech recognition (ASR) system in that incorrect pronunciations cause systematic misrecognitions. It typically con-sists of a list of word-pronunciation pairs written by linguists, and a grapheme-to-phoneme (G2P) engine to generate pronunciations for words not in the list. The hand-generated list can never keep pace with the growing vocabulary of a live speech recognition sys-tem, and the G2P is usually of limited accuracy. This is especially true for proper names whose pronunciations may be influenced by various historical or foreign-origin factors. In this paper, we pro-pose a language-independent approach to detect misrecognitions and their corrections from voice search logs. We learn previously un-known pronunciations from this data, and demonstrate that they sig-nificantly improve the quality of a production-quality speech recog-nition system. Index Terms — speech recognition, pronunciation learning, data extraction, logistic regression 1
- …