Search CORE

679 research outputs found

Artificial Neural Network (ANN) in a Small Dataset to determine Neutrality in the Pronunciation of English as a Foreign Language in Filipino Call Center Agents

Author: Baquirin Rey Benjamin M
Fernandez Proceso L, Jr
Publication venue: Archīum Ateneo
Publication date: 01/01/2018
Field of study

Artificial Neural Networks (ANNs) have continued to be efficient models in solving classification problems. In this paper, we explore the use of an ANN with a small dataset to accurately classify whether Filipino call center agents’ pronunciations are neutral or not based on their employer’s standards. Isolated utterances of the ten most commonly used words in the call center were recorded from eleven agents creating a dataset of 110 utterances. Two learning specialists were consulted to establish ground truths and Cohen’s Kappa was computed as 0.82, validating the reliability of the dataset. The first thirteen Mel-Frequency Cepstral Coefficients (MFCCs) were then extracted from each word and an ANN was trained with Ten-fold Stratified Cross Validation. Experimental results on the model recorded a classification accuracy of 89.60% supported by an overall F-Score of 0.92

archīum.ATENEO (Ateneo de Manila Univ.)

Artificial Neural Network (ANN) in a Small Dataset to determine Neutrality in the Pronunciation of English as a Foreign Language in Filipino Call Center Agents

Author: Proceso L. Fernandez
Rey Benjamin M. Baquirin
Publication venue: 'IBERAMIA: Sociedad Iberoamericana de Inteligencia Artificial'
Publication date: 01/01/2018
Field of study

Artificial Neural Networks (ANNs) have continued to be efficient models in solving classification problems. In this paper, we explore the use of an A NN with a small dataset to accurately classify whet her Filipino call center agents’ pronunciations are neutral or not based on their employer’s standards. Isolated utterances of the ten most commonly used words in the call center were recorded from eleven agents creating a dataset of 110 utterances. Two learning specialists were consulted to establish ground truths and Cohen’s Kappa was computed as 0.82, validating the reliability of the dataset. The first thirteen Mel-Frequency Cepstral Coefficients (MFCCs) were then extracted from each word and an ANN was trained with Ten-fold Stratified Cross Validation. Experimental results on the model recorded a classification accuracy of 89.60% supported by an overall F-Score of 0.92

Directory of Open Access Journals

archīum.ATENEO (Ateneo de Manila Univ.)

Code-mixed question answering challenge : crowd-sourcing data and techniques

Author: Black Alan W.
Chandu Khyathi
Chinnakotla Manoj
Genabith Josef van
Gupta Vishal
Loginova Ekaterina
Neumann Günter
Nyberg Eric
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Crossref

Ghent University Academic Bibliography

Automatic pronunciation verification for speech recognition

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Availability-Based Production Predicts Speakers' Real-time Choices of Mandarin Classifiers

Author: Levy Roger
Zhan Meilin
Publication venue
Publication date: 01/01/2019
Field of study

Speakers often face choices as to how to structure their intended message into an utterance. Here we investigate the influence of contextual predictability on the encoding of linguistic content manifested by speaker choice in a classifier language. In English, a numeral modifies a noun directly (e.g., three computers). In classifier languages such as Mandarin Chinese, it is obligatory to use a classifier (CL) with the numeral and the noun (e.g., three CL.machinery computer, three CL.general computer). While different nouns are compatible with different specific classifiers, there is a general classifier "ge" (CL.general) that can be used with most nouns. When the upcoming noun is less predictable, the use of a more specific classifier would reduce surprisal at the noun thus potentially facilitate comprehension (predicted by Uniform Information Density, Levy & Jaeger, 2007), but the use of that more specific classifier may be dispreferred from a production standpoint if accessing the general classifier is always available (predicted by Availability-Based Production; Bock, 1987; Ferreira & Dell, 2000). Here we use a picture-naming experiment showing that Availability-Based Production predicts speakers' real-time choices of Mandarin classifiers.Comment: To appear in proceedings of CogSci 201

arXiv.org e-Print Archive

eScholarship - University of California

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

Author: Imankulova Aizhan
Khairunnisa Siti Oryza
Komachi Mamoru
Plank Barbara
Ramponi Alan
Sharaf Ibrahim
Stepanovic Marija
van der Goot Rob
Üstün Ahmet
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual (x) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification

Archivio della ricerca - Fondazione Bruno Kessler

The IT University of Copenhagen's Repository

Fix it where it fails: Pronunciation learning by mining error corrections from speech logs

Author: Daisy Stanton
Fuchun Peng
Trevor Strohman
Zhenzhen Kou
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/11/2015
Field of study

The pronunciation dictionary, or lexicon, is an essential component in an automatic speech recognition (ASR) system in that incorrect pronunciations cause systematic misrecognitions. It typically con-sists of a list of word-pronunciation pairs written by linguists, and a grapheme-to-phoneme (G2P) engine to generate pronunciations for words not in the list. The hand-generated list can never keep pace with the growing vocabulary of a live speech recognition sys-tem, and the G2P is usually of limited accuracy. This is especially true for proper names whose pronunciations may be influenced by various historical or foreign-origin factors. In this paper, we pro-pose a language-independent approach to detect misrecognitions and their corrections from voice search logs. We learn previously un-known pronunciations from this data, and demonstrate that they sig-nificantly improve the quality of a production-quality speech recog-nition system. Index Terms — speech recognition, pronunciation learning, data extraction, logistic regression 1

CiteSeerX

Crossref

Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016), August 11, 2016, Berlin, Germany

Author: Friedrich Annemarie
Tomanek Katrin
Publication venue
Publication date: 01/01/2016
Field of study

OPUS Augsburg