6 research outputs found

    Speech Enhancement for Automatic Analysis of Child-Centered Audio Recordings

    Get PDF
    Analysis of child-centred daylong naturalist audio recordings has become a de-facto research protocol in the scientific study of child language development. The researchers are increasingly using these recordings to understand linguistic environment a child encounters in her routine interactions with the world. These audio recordings are captured by a microphone that a child wears throughout a day. The audio recordings, being naturalistic, contain a lot of unwanted sounds from everyday life which degrades the performance of speech analysis tasks. The purpose of this thesis is to investigate the utility of speech enhancement (SE) algorithms in the automatic analysis of such recordings. To this effect, several classical signal processing and modern machine learning-based SE methods were employed 1) as a denoiser for speech corrupted with additive noise sampled from real-life child-centred daylong recordings and 2) as front-end for downstream speech processing tasks of addressee classification (infant vs. adult-directed speech) and automatic syllable count estimation from the speech. The downstream tasks were conducted on data derived from a set of geographically, culturally, and linguistically diverse child-centred daylong audio recordings. The performance of denoising was evaluated through objective quality metrics (spectral distortion and instrumental intelligibility) and through the downstream task performance. Finally, the objective evaluation results were compared with downstream task performance results to find whether objective metrics can be used as a reasonable proxy to select SE front-end for a downstream task. The results obtained show that a recently proposed Long Short-Term Memory (LSTM)-based progressive learning architecture provides maximum performance gains in the downstream tasks in comparison with the other SE methods and baseline results. Classical signal processing-based SE methods also lead to competitive performance. From the comparison of objective assessment and downstream task performance results, no predictive relationship between task-independent objective metrics and performance of downstream tasks was found

    Developing a cross-cultural annotation system and metacorpus for studying infants' real world language experience

    Get PDF
    Recent issues around reproducibility, best practices, and cultural bias impact naturalistic observational approaches as much as experimental approaches, but there has been less focus onthis area. Here, we present a new approach that leverages cross-laboratory collaborative, interdisciplinary efforts to examine important psychological questions. We illustrate this approach with a particular project that examines similarities and differences in children's early experiences with language. This project develops a comprehensive start-to-finish analysis pipeline by developing a flexible and systematic annotation system, and implementing this system across a sampling from a metacorpus of audiorecordings of diverse language communities. This resource is publicly available for use, sensitive to cultural differences, and flexible to address a variety of research questions. It is also uniquely suited for use in the development of tools for automated analysis.Fil: Soderstrom, Melanie. University of Manitoba; CanadáFil: Casillas, Marisa. University of Chicago; Estados UnidosFil: Bergelson, Elika. University of Duke; Estados UnidosFil: Rosemberg, Celia Renata. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Saavedra 15. Centro Interdisciplinario de Investigaciones en Psicología Matemática y Experimental Dr. Horacio J. A. Rimoldi; ArgentinaFil: Alam, Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Saavedra 15. Centro Interdisciplinario de Investigaciones en Psicología Matemática y Experimental Dr. Horacio J. A. Rimoldi; ArgentinaFil: Warlaumont, Anne S.. University of California at Los Angeles; Estados UnidosFil: Bunce, John. California State University; Estados Unido

    Phonetic transcription of spontaneous children's speech with the aid of software : a systematic review

    Get PDF
    O objetivo do estudo foi identificar, sintetizar e classificar os softwares atualmente disponíveis que podem auxiliar na tarefa de transcrição fonética da fala espontânea de pré-escolares, para avaliar o desenvolvimento da linguagem infantil. Foi realizada uma revisão sistemática de artigos publicados, no período de 10 anos (de junho de 2010 a junho de 2020), sem restrições quanto à localização e idioma, utilizando as bases de dados Cochrane, Pubmed e Web of Science. Os termos utilizados nas estratégias de busca foram "fonológico", "fonético", "transcrição", "computador" e "software". Os estudos foram selecionados por dois revisores independentes usando estratégias de busca pré-definidas. Na busca inicial, após a exclusão de duplicatas, foram encontrados 534 artigos. Com a leitura de seus títulos e resumos, restaram 46 artigos relacionados ao tema, que foram lidos na íntegra. Após a leitura, 24 artigos foram incluídos no estudo. Os resultados revelaram um total de sete softwares disponíveis para auxiliar a transcrição fonética da fala espontânea de pré-escolares utilizados para diferentes análises: LENA e Timestamper (para balbucios e vocalizações pré-linguísticas), ELAN (para comunicação gestual, elementos extralinguísticos e contexto situacional), Phon (para análises fonéticas e fonológicas), CLAN e SALT (para aspectos morfossintáticos, gramaticais e semânticos) e Praat (para medidas acústicas). Por meio desta revisão sistemática, pode-se concluir que há vantagens no uso de software para transcrição fonética, armazenamento de amostras e análise de linguagem infantil, principalmente no que diz respeito à padronização e confiabilidade para amostras de fala espontânea. A transcrição fonética ainda depende de um transcritor humano. As ferramentas encontradas nos softwares fornecem suporte para facilitar o uso dos símbolos fonéticos, segmentação e pareamento de áudio para escrita e análises de dados de fala.The aim of the study was to identify, synthesize and classify the software currently available that can help in the task of phonetic transcription of the spontaneous speech of pre-school children to evaluate the development of children's language. A systematic review was performed for articles published, for the 10-year period (June 2010 to June 2020), without restrictions as to location and language, using the Cochrane, Pubmed and Web of Science databases. The terms used in the search strategies were "phonological", "phonetic", "transcription", "computer" and "software". The studies were selected by two independent reviewers using pre-defined search strategies. In the initial search, after the exclusion of duplicates, 534 articles were found. By reading their titles and abstracts, 46 articles related to the theme were left, which were then read in full. After reading, 24 articles were included in the study. The results revealed a total of seven software available for the phonetic transcription of spontaneous speech from preschoolers used for different analyses: LENA and Timestamper (for babbling and pre-linguistic vocalizations), ELAN (for gestural communication, extralinguistic elements and thesituational context), Phon (for phonetic and phonological analyses), CLAN and SALT (for morphosyntactic, grammatical and semantic aspects) and Praat (for acoustic measurements). Through this systematic review, it can be concluded that there are advantages to using software for phonetic transcription, sample storage, and child language analysis, especially concerning standardization and reliability for spontaneous speech samples. Phonetic transcription still relies on the ability and subjectivity of a human transcriber. The tools found in the software provide support to facilitate using phonetic symbols, audio segmentation and pairing to writing, and analysis of speech data

    Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

    No full text
    Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of- the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE

    Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

    Get PDF
    Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of-the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE.Peer reviewe
    corecore