753 research outputs found
Recommended from our members
A speech envelope landmark for syllable encoding in human superior temporal gyrus.
The most salient acoustic features in speech are the modulations in its intensity, captured by the amplitude envelope. Perceptually, the envelope is necessary for speech comprehension. Yet, the neural computations that represent the envelope and their linguistic implications are heavily debated. We used high-density intracranial recordings, while participants listened to speech, to determine how the envelope is represented in human speech cortical areas on the superior temporal gyrus (STG). We found that a well-defined zone in middle STG detects acoustic onset edges (local maxima in the envelope rate of change). Acoustic analyses demonstrated that timing of acoustic onset edges cues syllabic nucleus onsets, while their slope cues syllabic stress. Synthesized amplitude-modulated tone stimuli showed that steeper slopes elicited greater responses, confirming cortical encoding of amplitude change, not absolute amplitude. Overall, STG encoding of the timing and magnitude of acoustic onset edges underlies the perception of speech temporal structure
Improving the Speech Intelligibility By Cochlear Implant Users
In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients
Mandarin Tones Recognition by Segments of Fundamental Frequency Contours
ABSTRACT Mandarin is one of the tonal languages. In Mandarin tones, there are four lexical tones (tone 1 to tone 4) with four different fundamental frequency (f0), such as flat and high, rising, falling and then rising, and falling, respectively. In order to process signal on lexical tone, at first we have to identify which tone is. We would like to find out an efficient approach to identify Mandarin tones by the segments of fundamental frequency contours. In this study, 3 male and 3 female participants engaged in recording speech materials. All participants are native Mandarin speakers, no history of any speech or hearing disorder, and all passed articulation and voice assessment. There are two target syllables (/ti/ and /tu/) of four lexical tones used for the materials. In our experiment, we analysed the signal features and the acoustics characteristics that included the range of f0, the average of f0 and so on. Then we could predict which tone is from what the segments told. The result of this study revealed that the segments of the contours could not target the corresponding tone correctly. The approach of this study may not provide a way for hearing-devices to predict Mandarin tone before signal processing. The further study for prediction by segments of f0 contours is required
A syllable-based investigation of coarticulation
Coarticulation has been long investigated in Speech Sciences and Linguistics (KĂĽhnert &
Nolan, 1999). This thesis explores coarticulation through a syllable based model (Y. Xu,
2020). First, it is hypothesised that consonant and vowel are synchronised at the syllable
onset for the sake of reducing temporal degrees of freedom, and such synchronisation
is the essence of coarticulation. Previous efforts in the examination of CV alignment
mainly report onset asynchrony (Gao, 2009; Shaw & Chen, 2019). The first study of this
thesis tested the synchrony hypothesis using articulatory and acoustic data in Mandarin.
Departing from conventional approaches, a minimal triplet paradigm was applied, in
which the CV onsets were determined through the consonant and vowel minimal pairs,
respectively. Both articulatory and acoustical results showed that CV articulation started
in close temporal proximity, supporting the synchrony hypothesis. The second study
extended the research to English and syllables with cluster onsets. By using acoustic data
in conjunction with Deep Learning, supporting evidence was found for co-onset, which
is in contrast to the widely reported c-center effect (Byrd, 1995). Secondly, the thesis
investigated the mechanism that can maximise synchrony – Dimension Specific Sequential
Target Approximation (DSSTA), which is highly relevant to what is commonly known
as coarticulation resistance (Recasens & Espinosa, 2009). Evidence from the first two studies show that, when conflicts arise due to articulation requirements between CV, the
CV gestures can be fulfilled by the same articulator on separate dimensions simultaneously.
Last but not least, the final study tested the hypothesis that resyllabification is the result of
coarticulation asymmetry between onset and coda consonants. It was found that neural
network based models could infer syllable affiliation of consonants, and those inferred
resyllabified codas had similar coarticulatory structure with canonical onset consonants. In
conclusion, this thesis found that many coarticulation related phenomena, including local
vowel to vowel anticipatory coarticulation, coarticulation resistance, and resyllabification,
stem from the articulatory mechanism of the syllable
Temporal relation between top-down and bottom-up processing in lexical tone perception
Speech perception entails both top-down processing that relies primarily on language experience and bottom-up processing that depends mainly on instant auditory input. Previous models of speech perception often claim that bottom-up processing occurs in an early time window, whereas top-down processing takes place in a late time window after stimulus onset. In this paper, we evaluated the temporal relation of both types of processing in lexical tone perception. We conducted a series of event-related potential (ERP) experiments that recruited Mandarin participants and adopted three experimental paradigms, namely dichotic listening, lexical decision with phonological priming, and semantic violation. By systematically analyzing the lateralization patterns of the early and late ERP components that are observed in these experiments, we discovered that: auditory processing of pitch variations in tones, as a bottom-up effect, elicited greater right hemisphere activation; in contrast, linguistic processing of lexical tones, as a top-down effect, elicited greater left hemisphere activation. We also found that both types of processing co-occurred in both the early (around 200 ms) and late (around 300–500 ms) time windows, which supported a parallel model of lexical tone perception. Unlike the previous view that language processing is special and performed by dedicated neural circuitry, our study have elucidated that language processing can be decomposed into general cognitive functions (e.g., sensory and memory) and share neural resources with these functions.published_or_final_versio
Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis
Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The “OPERA” hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities
- …