Search CORE

1,599 research outputs found

Recommended from our members

Is voice a marker for autism spectrum disorder? A systematic review and meta-analysis

Author: Alden
Amorosa
Anguera
Asgari
Asperger
Baltaxe
Baltaxe
Banse
Bishop
Bone
Bone
Bonneh
Boucher
Boucher
Bourgondien
Brisson
Bryant
Chan
Cochran
Cohen
Cruys
Cummins
Dairoku
Dale
Degottex
Depape
Diehl
Diehl
Diehl
Eadie
Ellis
Feldstein
Field
Filipe
Fine
Forbes-Riley
Fosnot
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Fusaroli
Goldfarb
Goldfarb
Green
Grossman
Hastie
Higgins
Hopkins
Hubbard
Jiang
Järvinen-Pasley
Kakihara
Kaland
Kiss
Klin
Klopfenstein
Lambrechts
Laver
Liscombe
Lord
Lord
Marchi
Marwan
Maryn
McCann
Michael
Miro
Morett
Mushin
Nadig
Nakai
Oller
Orlikoff
Paccia
Palmer
Parish-Morris
Paul
Paul
Paul
Paul
Pickering
Pronovost
Quigley
Quintana
Riley
Rodriguez
Rogers
Ruggeri
Santos
Scharfstein
Sharda
Sheinkopf
Shriberg
Shriberg
Simmons
Slocombe
Thurber
Titze
Travis
Tsanas
Viechtbauer
Vosoughi
Wallace
Warlaumont
Yarkoni
Publication venue: 'Wiley'
Publication date: 03/04/2016
Field of study

Individuals with Autism Spectrum Disorder (ASD) tend to show distinctive, atypical acoustic patterns of speech. These behaviours affect social interactions and social development and could represent a non-invasive marker for ASD. We systematically reviewed the literature quantifying acoustic patterns in ASD. Search terms were: (prosody OR intonation OR inflection OR intensity OR pitch OR fundamental frequency OR speech rate OR voice quality OR acoustic) AND (autis* OR Asperger). Results were filtered to include only: empirical studies quantifying acoustic features of vocal production in ASD, with a sample size > 2, and the inclusion of a neurotypical comparison group and/or correlations between acoustic measures and severity of clinical features. We identified 34 articles, including 30 univariate studies and 15 multivariate machine-learning studies. We performed metaanalyses of the univariate studies, identifying significant differences in mean pitch and pitch range between individuals with ASD and comparison participants (Cohen’s d of 0.4-0.5 and discriminatory accuracy of about 61-64%). The multivariate studies reported higher accuracies than the univariate studies (63-96%). However, the methods used and the acoustic features investigated were too diverse for performing meta-analysis. We conclude that multivariate studies of acoustic patterns are a promising but yet unsystematic avenue for establishing ASD markers. We outline three recommendations for future studies: open data, open methods, and theory-driven research

City Research Online

Crossref

UCL Discovery

Extraction and Classification of Acoustic Features from Italian Speaking Children with Autism Spectrum Disorders

Author: Beccaria Federica
Dimitrios Kokkinakis
Gagliardi Gloria
Publication venue: place:Paris
Publication date: 01/01/2022
Field of study

Autism Spectrum Disorders (ASD) are a group of complex developmental conditions whose effects and severity show high intraindividual variability. However, one of the main symptoms shared along the spectrum is social interaction impairments that can be explored through acoustic analysis of speech production. In this paper, we compare 14 Italian-speaking children with ASD and 14 typically developing peers. Accordingly, we extracted and selected the acoustic features related to prosody, quality of voice, loudness, and spectral distribution using the parameter set eGeMAPS provided by the openSMILE feature extraction toolkit. We implemented four supervised machine learning methods to evaluate the extraction performances. Our findings show that Decision Trees (DTs) and Support Vector Machines (SVMs) are the best-performing methods. The overall DT models reach a 100% recall on all the trials, meaning they correctly recognise autistic features. However, half of its models overfit, while SVMs are more consistent. One of the results of the work is the creation of a speech pipeline to extract Italian speech biomarkers typical of ASD by comparing our results with studies based on other languages. A better understanding of this topic can support clinicians in diagnosing the disorder

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Recommended from our members

Defining and distinguishing infant behavioral states using acoustic cry analysis: is colic painful?

Author: Alwan Abeer
Anderson Ariana E
Bookheimer Susan Y
Dang Bianca H
Dapretto Mirella
Dunlap Lauren
Eyer Sherry
Han Carol
Lewin Sharon
Lewis Juanita
Montoya-Williams Diana
Nookala Usha
Parga Joanna J
Shaul Brianna
Sun Daniel
Zeltzer Lonnie
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

BackgroundTo characterize acoustic features of an infant's cry and use machine learning to provide an objective measurement of behavioral state in a cry-translator. To apply the cry-translation algorithm to colic hypothesizing that these cries sound painful.MethodsAssessment of 1000 cries in a mobile app (ChatterBabyTM). Training a cry-translation algorithm by evaluating >6000 acoustic features to predict whether infant cry was due to a pain (vaccinations, ear-piercings), fussy, or hunger states. Using the algorithm to predict the behavioral state of infants with reported colic.ResultsThe cry-translation algorithm was 90.7% accurate for identifying pain cries, and achieved 71.5% accuracy in discriminating cries from fussiness, hunger, or pain. The ChatterBaby cry-translation algorithm overwhelmingly predicted that colic cries were most likely from pain, compared to fussy and hungry states. Colic cries had average pain ratings of 73%, significantly greater than the pain measurements found in fussiness and hunger (p < 0.001, 2-sample t test). Colic cries outranked pain cries by measures of acoustic intensity, including energy, length of voiced periods, and fundamental frequency/pitch, while fussy and hungry cries showed reduced intensity measures compared to pain and colic.ConclusionsAcoustic features of cries are consistent across a diverse infant population and can be utilized as objective markers of pain, hunger, and fussiness. The ChatterBaby algorithm detected significant acoustic similarities between colic and painful cries, suggesting that they may share a neuronal pathway

eScholarship - University of California

Analysis of atypical prosodic patterns in the speech of people with Down syndrome

Author: Cardeñoso Payo Valentín
Corrales Astorgano Mario
Escudero Mancebo David
González Ferreras César
Martínez Castilla Pastora
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Producción CientíficaThe speech of people with Down syndrome (DS) shows prosodic features which are distinct from those observed in the oral productions of typically developing (TD) speakers. Although a different prosodic realization does not necessarily imply wrong expression of prosodic functions, atypical expression may hinder communication skills. The focus of this work is to ascertain whether this can be the case in individuals with DS. To do so, we analyze the acoustic features that better characterize the utterances of speakers with DS when expressing prosodic functions related to emotion, turn-end and phrasal chunking, comparing them with those used by TD speakers. An oral corpus of speech utterances has been recorded using the PEPS-C prosodic competence evaluation tool. We use automatic classifiers to prove that the prosodic features that better predict prosodic functions in TD speakers are less informative in speakers with DS. Although atypical features are observed in speakers with DS when producing prosodic functions, the intended prosodic function can be identified by listeners and, in most cases, the features correctly discriminate the function with analytical methods. However, a greater difference between the minimal pairs presented in the PEPS-C test is found for TD speakers in comparison with DS speakers. The proposed methodological approach provides, on the one hand, an identification of the set of features that distinguish the prosodic productions of DS and TD speakers and, on the other, a set of target features for therapy with speakers with DS.Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (grant TIN2017-88858-C2-1-R)Junta de Castilla y León (grant VA050G18

Repositorio Documental de la Universidad de Valladolid

Early Human Vocalization Development: A Collection of Studies Utilizing Automated Analysis of Naturalistic Recordings and Neural Network Modeling

Author: Warlaumont Anne Sanda
Publication venue: University of Memphis Digital Commons
Publication date: 24/07/2012
Field of study

Understanding early human vocalization development is a key part of understanding the origins of human communication. What are the characteristics of early human vocalizations and how do they change over time? What mechanisms underlie these changes? This dissertation is a collection of three papers that take a computational approach to addressing these questions, using neural network simulation and automated analysis of naturalistic data.The first paper uses a self-organizing neural network to automatically derive holistic acoustic features characteristic of prelinguistic vocalizations. A supervised neural network is used to classify vocalizations into human-judged categories and to predict the age of the child vocalizing. The study represents a first step toward taking a data-driven approach to describing infant vocalizations. Its performance in classification represents progress toward developing automated analysis tools for coding infant vocalization types.The second paper is a computational model of early vocal motor learning. It adapts a popular type of neural network, the self-organizing map, in order to control a vocal tract simulator and in order to have learning be dependent on whether the model\u27s actions are reinforced. The model learns both to control production of sound at the larynx (phonation), an early-developing skill that is a prerequisite for speech, and to produce vowels that gravitate toward the vowels in a target language (either English or Korean) for which it is reinforced. The model provides a computationally-specified explanation for how neuromotor representations might be acquired in infancy through the combination of exploration, reinforcement, and self-organized learning.The third paper utilizes automated analysis to uncover patterns of vocal interaction between child and caregiver that unfold over the course of day-long, totally naturalistic recordings. The participants include 16- to 48-month-old children with and without autism. Results are consistent with the idea that there is a social feedback loop wherein children produce speech-related vocalizations, these are preferentially responded to by adults, and this contingency of adult response shapes future child vocalizations. Differences in components of this feedback loop are observed in autism, as well as with different maternal education levels

University of Memphis Digital Commons

Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: Parents fulfill a pivotal role in early childhood development of social and communication skills. In children with autism, the development of these skills can be delayed. Applied behavioral analysis (ABA) techniques have been created to aid in skill acquisition. Among these, pivotal response treatment (PRT) has been empirically shown to foster improvements. Research into PRT implementation has also shown that parents can be trained to be effective interventionists for their children. The current difficulty in PRT training is how to disseminate training to parents who need it, and how to support and motivate practitioners after training. Evaluation of the parents’ fidelity to implementation is often undertaken using video probes that depict the dyadic interaction occurring between the parent and the child during PRT sessions. These videos are time consuming for clinicians to process, and often result in only minimal feedback for the parents. Current trends in technology could be utilized to alleviate the manual cost of extracting data from the videos, affording greater opportunities for providing clinician created feedback as well as automated assessments. The naturalistic context of the video probes along with the dependence on ubiquitous recording devices creates a difficult scenario for classification tasks. The domain of the PRT video probes can be expected to have high levels of both aleatory and epistemic uncertainty. Addressing these challenges requires examination of the multimodal data along with implementation and evaluation of classification algorithms. This is explored through the use of a new dataset of PRT videos. The relationship between the parent and the clinician is important. The clinician can provide support and help build self-efficacy in addition to providing knowledge and modeling of treatment procedures. Facilitating this relationship along with automated feedback not only provides the opportunity to present expert feedback to the parent, but also allows the clinician to aid in personalizing the classification models. By utilizing a human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the classification models by providing additional labeled samples. This will allow the system to improve classification and provides a person-centered approach to extracting multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Recommended from our members

The role of HG in the analysis of temporal iteration and interaural correlation

Author: Barrett DJK
Hall DA
Publication venue
Publication date: 01/01/2004
Field of study

Nottingham Trent Institutional Repository (IRep)

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

Directory of Open Access Books (DOAB)

The emotional component of Infant Directed-Speech: A cross-cultural study using machine learning

Author: Cadic J. -M.
Chetouani M.
Cohen D.
Feldman R.
Ghattassi Z.
Muratori F.
Ouss L.
Parlato-Oliveira E.
Saint-Georges C.
Viaux S.
Xavier J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Backgrounds: Infant-directed speech (IDS) is part of an interactive loop that plays an important role in infants’ cognitive and social development. The use of IDS is universal and is composed of linguistic and emotional components. However, whether the emotional component has similar acoustics characteristics has not been studied automatically. Methods: We performed a cross-cultural study using automatic social signal processing techniques (SSP) to compare IDS across languages. Our speech corpus consisted of audio-recorded vocalizations from parents during interactions with their infant between the ages of 4 and 18 months. It included 6 databases of five languages: English, French, Hebrew (two databases: mothers/fathers), Italian, and Brazilian Portuguese. We used an automatic classifier that exploits the acoustic characteristics of speech and machine learning methods (Support Vector Machines, SVM) to distinguish emotional IDS and non-emotional IDS. Results: Automated classification of emotional IDS was possible for all languages and speakers (father and mother). The uni-language condition (classifier trained and tested in the same language) produced moderate to excellent classification results, all of which were significantly different from chance (P < 1 × 10−10). More interestingly, the cross-over condition (IDS classifier trained in one language and tested in another language) produced classification results that were all significantly different from chance (P < 1 × 10−10). Conclusion: The automated classification of emotional and non-emotional components of IDS is possible based on the acoustic characteristics regardless of the language. The results found in the cross-over condition support the hypothesis that the emotional component shares similar acoustic characteristics across languages

Archivio della Ricerca - Università di Pisa

Driver frustration detection from audio and video in the wild

Author: Abdic Irman
Fridmann Lex
Marchi Erik
McDuff Daniel
Reimer Bryan
Schuller Björn
Publication venue
Publication date: 15/03/2023
Field of study

We present a method for detecting driver frustration from both video and audio streams captured during the driver's interaction with an in-vehicle voice-based navigation system. The video is of the driver's face when the machine is speaking, and the audio is of the driver's voice when he or she is speaking. We analyze a dataset of 20 drivers that contains 596 audio epochs (audio clips, with duration from 1 sec to 15 sec) and 615 video epochs (video clips, with duration from 1 sec to 45 sec). The dataset is balanced across 2 age groups, 2 vehicle systems, and both genders. The model was subject-independently trained and tested using 4-fold cross-validation. We achieve an accuracy of 77.4% for detecting frustration from a single audio epoch and 81.2% for detecting frustration from a single video epoch. We then treat the video and audio epochs as a sequence of interactions and use decision fusion to characterize the trade-off between decision time and classification accuracy, which improved the prediction accuracy to 88.5% after 9 epochs

OPUS Augsburg