314 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study

    Full text link
    Several high-resource Text to Speech (TTS) systems currently produce natural, well-established human-like speech. In contrast, low-resource languages, including Arabic, have very limited TTS systems due to the lack of resources. We propose a fully unsupervised method for building TTS, including automatic data selection and pre-training/fine-tuning strategies for TTS training, using broadcast news as a case study. We show how careful selection of data, yet smaller amounts, can improve the efficiency of TTS system in generating more natural speech than a system trained on a bigger dataset. We adopt to propose different approaches for the: 1) data: we applied automatic annotations using DNSMOS, automatic vowelization, and automatic speech recognition (ASR) for fixing transcriptions' errors; 2) model: we used transfer learning from high-resource language in TTS model and fine-tuned it with one hour broadcast recording then we used this model to guide a FastSpeech2-based Conformer model for duration. Our objective evaluation shows 3.9% character error rate (CER), while the groundtruth has 1.3% CER. As for the subjective evaluation, where 1 is bad and 5 is excellent, our FastSpeech2-based Conformer model achieved a mean opinion score (MOS) of 4.4 for intelligibility and 4.2 for naturalness, where many annotators recognized the voice of the broadcaster, which proves the effectiveness of our proposed unsupervised method

    Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech

    Get PDF
    Several modification algorithms that alter natural or synthetic speech with the goal of improving intelligibility in noise have been proposed recently. A key requirement of many modification techniques is the ability to predict intelligibility, both offline during algorithm development, and online, in order to determine the optimal modification for the current noise context. While existing objective intelligibility metrics (OIMs) have good predictive power for unmodified natural speech in stationary and fluctuating noise, little is known about their effectiveness for other forms of speech. The current study evaluated how well seven OIMs predict listener responses in three large datasets of modified and synthetic speech which together represent 396 combinations of speech modification, masker type and signal-to-noise ratio. The chief finding is a clear reduction in predictive power for most OIMs when faced with modified and synthetic speech. Modifications introducing durational changes are particularly harmful to intelligibility predictors. OIMs that measure masked audibility tend to over-estimate intelligibility in the presence of fluctuating maskers relative to stationary maskers, while OIMs that estimate the distortion caused by the masker to a clean speech prototype exhibit the reverse pattern

    The language and literacy skills and behaviours of two middle primary severely to profoundly hearing impaired students in the school environment

    Get PDF
    Much research has shown that the hearing impaired population typically achieve only very low levels of literacy. Many researchers have examined the language and literacy deficits of the hearing impaired population in order to explain this. Nevertheless, a recent study has shown that hearing impaired children\u27s preschool language and literacy development may occur along a similar pathway to that of their hearing peers. The present study aimed to investigate the language and literacy skills, behaviours and interactions of two severely to profoundly hearing impaired middle primary boys in the context of their mainstream school. Both qualitative and quantitative data sources were accessed, which included background records, interviews, standardised testing, sample analyses and observations in the school environment. The boys were reported as having strong visual skills. Results showed that whilst they displayed delays in receptive language and metalinguistic awareness both boys were able to read, but with different levels of achievement: one showed delays in both word recognition and comprehension; the other demonstrated particularly strong word recognition but less highly developed comprehension. There were also differences between the boys in their levels of writing and social language. Nevertheless, whilst one of them showed appropriate social language and interaction skills, they were both often excluded by their hearing peers. Various peer, teacher and environmental factors were identified within the school setting which may have interfered with the boys\u27 social interactions and language and literacy learning. These findings are interpreted in terms of theories of language and literacy acquisition in hearing impaired children and their integration into mainstream settings. Some implications for educational practice and further research are presented

    Children's acoustic and linguistic adaptations of peers with hearing impairment

    Get PDF
    Purpose: This study aims to examine the clear speaking strategies used by older children when interacting with a peer with hearing loss, focusing on both acoustic and linguistic adaptations in speech. Method: The Grid task, a problem-solving task developed to elicit spontaneous interactive speech, was used to obtain a range of global acoustic and linguistic measures. Eighteen 9- to 14-year-old children with normal-hearing (NH) performed the task in pairs, once with a friend with NH, and once with a friend with a hearing-impairment (HI). Results: In HI-directed speech, children increased their fundamental frequency range and mid-frequency intensity, decreased the number of words per phrase, and expanded their vowel space area by increasing F1 and F2 range, relative to NH-directed speech. However, participants did not appear to make changes to their articulation rate, the lexical frequency of content words, or to lexical diversity, when talking to their friend with HI compared to their friend with NH. Conclusions: Older children show evidence of listener-oriented adaptations to their speech production; although their speech production systems are still developing, they are able to make speech adaptations to benefit the needs of a peer with HI, even without being given specific instruction to do so

    English as an Academic Lingua Franca in Spanish Tertiary Education: An Analysis of the use of Pragmatic Strategies in English-Medium LectureS.

    Get PDF
    Durante la última década, un cambio lingüístico ha sido especialmente notable en los contextos de educación superior debido al creciente uso del inglés como medio de instrucción (EMI) en las universidades europeas. Por ello, existe una innegable necesidad de saber más sobre las prácticas diarias de quienes participan en actividades académicas internacionales usando el inglés como vehículo de comunicación. Numerosos estudios se han realizado previamente en relación al inglés utilizado como lengua franca (ELF) en el ámbito académico. Sin embargo, existe una relativa falta de estudios empíricos sobre este uso del inglés en las universidades españolas en comparación con estudios similares en instituciones académicas europeas (Mauranen, 2006b; Björkman, 2010, 2011b, 2013). Esta investigación pretende estudiar las prácticas de inglés como medio de instrucción en diferentes disciplinas en la Universidad de Zaragoza (España), centrándose en el tipo de estrategias pragmáticas que utilizan los participantes para facilitar la comprensión. Estas prácticas lingüísticas son analizadas en este estudio con el fin de arrojar luz sobre el impacto que tiene el inglés en la eficacia comunicativa en estos entornos de enseñanza-aprendizaje.Los resultados derivan del análisis de un corpus de 12 clases magistrales impartidas en inglés como medio de instrucción que fueron grabadas en dos titulaciones diferentes. Estas se complementan con entrevistas semiestructuradas con los profesores y un pequeño corpus de diapositivas de presentaciones en formato PowerPoint que los mismos profesores utilizaron para impartir sus clases. Para analizar estos tres conjuntos de datos se ha utilizado un enfoque discursivo-pragmático y una metodología de orientación etnográfica. Por lo tanto, en este estudio se utiliza la triangulación de datos y la triangulación metodológica, ambas derivando en resultados tanto cuantitativos como cualitativos. Los resultados del estudio muestran 13 estrategias pragmáticas diferentes utilizadas en las sesiones magistrales grabadas para cumplir funciones comunicativas tales como potenciar la explicitud, aclarar y negociar el significado y/o el uso aceptable del lenguaje. El análisis de datos revela que las estrategias pragmáticas observadas en el corpus se utilizan principalmente para evitar posibles problemas comunicativos, pero también para remediar problemas de producción que obstaculizan abiertamente la comunicación y para co-construir la comprensión. Respaldando los estudios existentes sobre el inglés utilizado como lengua vehicular para la instrucción, los resultados revelan un uso altamente contextual y situacional de estrategias pragmáticas.<br /

    Exploring Intelligent Personal Assistants in Second Language Acquisition

    Get PDF
    Abstract Exploring Intelligent Personal Assistants in Second Language Acquisition Souheila Moussalli, Ph.D. Concordia University, 2022 The goal of this dissertation is to investigate Intelligent Personal Assistants (IPAs), a voice-controlled service that can complete various functions by orally interacting with its users, as pedagogical tools in English second language classrooms to assess their pedagogical suitability. This dissertation begins with a review of the literature focusing on the importance of using technology in the language classroom. The remainder is divided into three manuscript-based chapters in which each manuscript addresses one aspect of the general research questions: (a) What are language learners’ perceptions of the use of IPAs as learning tools? (Manuscript A); (b) Can IPAs understand different language learners, and can these learners understand IPAs? (Manuscript B); and (c) Can IPAs help English language learners improve their receptive and productive skills? (Manuscript C). The first manuscript investigates the use of IPAs and users’ perceptions of the technology as a language learning tool. It examines a number of variables such as the IPAs’ ease of use, options for learner self-regulation (defined as learners’ ability to understand and control their learning environment), learner motivation and, more importantly, opportunities for learner input and output practice. The second manuscript explores IPA’s ability to interact with different accented language learners of English. The focus is on exploring the IPA’s ability to understand speech from different levels of language accentedness, and vice versa: to explore learners’ ability to understand the synthesized speech. The third manuscript investigates whether the pedagogical use of IPAs can lead to improvements in learners’ phonological awareness, perception and production of the allomorphy that characterizes regular past tense -ed marking in English (example depending on the preceding phonological environment, suffix -ed can be pronounced as talk/t/, play/d/ and add/id/). This dissertation contributes to our knowledge of learner experience and attitudes towards IPAs as it can further unfold the potentials and limitations of the technology. As far as second language phonology/pronunciation is concerned, the dissertation breaks new ground in research since little is known about IPAs and their pedagogical potential for the development of second language listening and speaking skills

    The impact of shared knowledge on speakers’ prosody

    Get PDF
    International audienceHow does the knowledge shared by interlocutors during interaction modify the way speakers speak? Specifically, how does prosody change when speakers know that their addressees do not share the same knowledge as them? We studied these effects in an interactive paradigm in which French speakers gave instructions to addressees about where to place a cross between different objects (e.g., You put the cross between the red mouse and the red house). We manipulated (i) whether the two interlocutors shared or did not necessarily share the same objects and (ii) the informational status of referents. We were interested in two types of prosodic variations: global prosodic variations that affect entire utterances (i.e., pitch range and speech rate variations) and more local prosodic variations that encode infor-mational status of referents (i.e., prosodic phrasing for French). We found that participants spoke more slowly and with larger pitch excursions in the not-shared knowledge condition than in the shared knowledge condition while they did not prosodically encode the informa-tional status of referents regardless of the knowledge condition. Results demonstrated that speakers kept track of what the addressee knew, and that they adapted their global prosody to their interlocutors. This made the task too cognitively demanding to allow the prosodic encoding of the informational status of referents. Our findings are in line with the idea that complex reasoning usually implicated in constructing a model of the addressee co-exists with speaker-internal constraints such as cognitive load to affect speaker's prosody during interaction
    corecore