Search CORE

626 research outputs found

Universal and language-specific processing : the case of prosody

Author: Ip Martin Ho Kwan
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2019
Field of study

A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

Western Sydney ResearchDirect

Pushing the envelope: Evaluating speech rhythm with different envelope extraction techniques

Author: Cai CQ
Macintyre AD
Scott SK
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 23/03/2022
Field of study

The amplitude of the speech signal varies over time, and the speech envelope is an attempt to characterise this variation in the form of an acoustic feature. Although tacitly assumed, the similarity between the speech envelope-derived time series and that of phonetic objects (e.g., vowels) remains empirically unestablished. The current paper, therefore, evaluates several speech envelope extraction techniques, such as the Hilbert transform, by comparing different acoustic landmarks (e.g., peaks in the speech envelope) with manual phonetic annotation in a naturalistic and diverse dataset. Joint speech tasks are also introduced to determine which acoustic landmarks are most closely coordinated when voices are aligned. Finally, the acoustic landmarks are evaluated as predictors for the temporal characterisation of speaking style using classification tasks. The landmark that performed most closely to annotated vowel onsets was peaks in the first derivative of a human audition-informed envelope, consistent with converging evidence from neural and behavioural data. However, differences also emerged based on language and speaking style. Overall, the results show that both the choice of speech envelope extraction technique and the form of speech under study affect how sensitive an engineered feature is at capturing aspects of speech rhythm, such as the timing of vowels

UCL Discovery

Max-Planck-Institute for Psycholinguistics: Annual Report 2003

Author: Johnson E.
Matsuo A.
Publication venue: MPI for Psycholinguistics
Publication date: 01/01/2003
Field of study

MPG.PuRe

Investigating the build-up of precedence effect using reflection masking

Author: Buchholz Jörg
Hartcher-O'Brien Jessica
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2006
Field of study

The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels

Online Research Database In Technology

MPG.PuRe

Effects of errorless learning on the acquisition of velopharyngeal movement control

Author: Ma E
Masters R
Whitehill T
Wong WK
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2012
Field of study

Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

HKU Scholars Hub

Children\u27s Sensitivity to Pitch Variation in Language

Author: Quam Carolyn
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Children acquire consonant and vowel categories by 12 months, but take much longer to learn to interpret perceptible variation. This dissertation considers children’s interpretation of pitch variation. Pitch operates, often simultaneously, at different levels of linguistic structure. English-learning children must disregard pitch at the lexical level—since English is not a tone language—while still attending to pitch for its other functions. Chapters 1 and 5 outline the learning problem and suggest ways children might solve it. Chapter 2 demonstrates that 2.5-year-olds know pitch cannot differentiate words in English. Chapter 3 finds that not until age 4–5 do children correctly interpret pitch cues to emotions. Chapter 4 demonstrates some sensitivity between 2.5 and 5 years to the pitch cue to lexical stress, but continuing difficulties at the older ages. These findings suggest a late trajectory for interpretation of prosodic variation; throughout, I propose explanations for this protracted time-course

ScholarlyCommons@Penn

Sound pressure distribution in a long, narrow hallway: Measurements versus results from a computer model with scattering from surface roughness and diffraction

Author: Christensen Claus Lynge
Rathsam Jonathan
Rindel Jens Holger
Wang Lily M.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2005
Field of study

Online Research Database In Technology

AN ANALYSIS OF INTONATION PATTERNS IN ECUADORIAN CUENCANO SPANISH: A SP_ToBI DESCRIPTION

Author: Portocarrero Alex Jamil 1984-
Publication venue: 'University of Saskatchewan Library'
Publication date: 04/11/2019
Field of study

El Cantado Cuencano ‘Cuencano singing’ constitutes the hallmark of Cuenca citizens. This colloquially described intonational feature is what makes Cuencano Spanish one of the most prosodically interesting Andean dialects in the country of Ecuador. There is, however, a lack of scientific research conducted on this dialect’s intonation, which can be considered as under-documented up to this point. Therefore, the main objective of the present study was to begin to analyze and document Cuencano Spanish intonation patterns. In addition, this research also aimed to provide scientific evidence and draw plausible conclusions to support or refute the impressionistic observations about the Indigenous origins of Cuencano singing. A sample of 550 utterances produced by 5 male and 5 female participants was collected in order to conduct this research. The sample comprised 11 categories that included declarative statements, yes/no questions, exclamative statements, wh-questions, imperatives, lists, conditionals, tag-questions, interjections, negative statements, and vocatives. The tokens were analyzed using Praat and labeled by implementing the Spanish version of the Tones and Break Indices system (Sp_ToBI). It was found that the presence of the emphatic pitch accent labeled as L+^H* and the high frequency appearance of bitonal pitch accents, such as L+H* and H+L*, in almost every token in the data set suggest that Cuencanos speak with a variety of degrees of tonal emphasis. This translates into a mixture of a substantial number of rising and falling tones found in Cuencanos’ speech. These findings account for the appearance of the highly marked singing quality of Cuencano Spanish or Cantado Cuencano. They may also be linked to impressionistic descriptions, such as esdrujulizacion, and the influence that Indigenous languages and culture had on Cuencano Spanish

University of Saskatchewan Research Archive

On Automatic Diagnosis of Alzheimer's Disease based on Spontaneous Speech Analysis and Emotional Temperature

Author: Alonso Jesús B.
Barroso Nora
Ecay-Torres Miriam
Egiraun Harkaitz
Faundez-Zanuy Marcos
Henriquez P.
Lopez-de-Ipiña Karmele
Martinez-Lage Pablo
Solé-Casals Jordi
Travieso Carlos M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Alzheimer's disease is the most prevalent form of progressive degenerative dementia; it has a high socio-economic impact in Western countries. Therefore it is one of the most active research areas today. Alzheimer's is sometimes diagnosed by excluding other dementias, and definitive confirmation is only obtained through a post-mortem study of the brain tissue of the patient. The work presented here is part of a larger study that aims to identify novel technologies and biomarkers for early Alzheimer's disease detection, and it focuses on evaluating the suitability of a new approach for early diagnosis of Alzheimer’s disease by non-invasive methods. The purpose is to examine, in a pilot study, the potential of applying Machine Learning algorithms to speech features obtained from suspected Alzheimer sufferers in order help diagnose this disease and determine its degree of severity. Two human capabilities relevant in communication have been analyzed for feature selection: Spontaneous Speech and Emotional Response. The experimental results obtained were very satisfactory and promising for the early diagnosis and classification of Alzheimer’s disease patients

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RIUVic