387 research outputs found
Recommended from our members
Mismatch Negativity (MMN) reveals inefficient auditory ventral stream function in chronic auditory comprehension impairments
Background: Auditory discrimination is significantly impaired in Wernicke’s aphasia (WA) and thought to be causatively related to the language comprehension impairment which characterises the condition. This study used mismatch negativity (MMN) to investigate the neural responses corresponding to successful and impaired auditory discrimination in WA.
Methods: Behavioural auditory discrimination thresholds of CVC syllables and pure tones were measured in WA (n=7) and control (n=7) participants. Threshold results were used to develop multiple-deviant mismatch negativity (MMN) oddball paradigms containing deviants which were either perceptibly or non-perceptibly different from the standard stimuli. MMN analysis investigated differences associated with group, condition and perceptibility as well as the relationship between MMN responses and comprehension (within which behavioural auditory discrimination profiles were examined).
Results: MMN waveforms were observable to both perceptible and non-perceptible auditory changes. Perceptibility was only distinguished by MMN amplitude in the PT condition. The WA group could be distinguished from controls by an increase in MMN response latency to CVC stimuli change. Correlation analyses displayed relationship between behavioural CVC discrimination and MMN amplitude in the control group, where greater amplitude corresponded to better discrimination. The WA group displayed the inverse effect; both discrimination accuracy and auditory comprehension scores were reduced with increased MMN amplitude. In the WA group, a further correlation was observed between the lateralisation of MMN response and CVC discrimination accuracy; the greater the bilateral involvement the better the discrimination accuracy.
Conclusions: The results from this study provide further evidence for the nature of auditory comprehension impairment in WA and indicate that the auditory discrimination deficit is grounded in a reduced ability to engage in efficient hierarchical processing and the construction of invariant auditory objects. Correlation results suggest that people with chronic WA may rely on an inefficient, noisy right hemisphere auditory stream when attempting to process speech stimuli
Relation between acoustic and articulatory dimensions of speech sounds
In their daily communication, speakers produce speech by pushing a controlled air stream past their vocal folds and through a vocal tract configuration formed by a set of articulators which ultimately results in a certain acoustic output. In this sense, speech and, specifically, speech sounds can be understood as a relation between articulatory and acoustic dimensions. This idea is supported by more recent neuroimaging results which suggest that sensory representations of speech sounds are stored across auditory and somatosensory cortices and are characterized by neural auditory-somatosensory mappings. The overall aim of the current dissertation is to improve our understanding of the functional nature of this relation. To this end, this thesis investigates the influence of a stronger linguo-palatal contact on speakers’ ability to employ multiple concurrent compensatory strategies during production of vowels and fricatives. During the data analysis, speakers’ individual as well as average compensatory behavior is investigated by means of generalized additive mixed models (GAMM) and a supervised classification algorithm (random forest). A framework is then developed that allows to estimate the extent of spectral adaptations in vowels and fricatives and to draw a direct comparison between these sounds. The experimental results are discussed in the context of current speech production theories and agree with the overall idea that speech sounds are perceptuo-motor units comprising of articulatory movements which are shaped by perceptual properties and selected for their functional value for communication.Sprecher produzieren Sprachlaute, indem sie einen kontrollierten Luftstrom vorbei an ihren Stimmlippen und durch eine artikulatorische Konfiguration führen, was letztendlich in einem bestimmten akustischen Ergebnis mündet. In diesem Sinne können Sprachlaute als Relationen zwischen der artikulatorischen und der akustischen Dimension verstanden werden. Diese allgemeine Vorstellung wird durch die Ergebnisse der Neuroforschung gestützt, die darauf hindeuten, dass sensorische Repräsentationen von Sprachlauten sowohl im auditiven als auch somatosensorischen Cortex gespeichert werden und sich durch neuronale auditiv-somatosensorische Zuordnungen auszeichnen. Das übergeordnete Ziel der vorliegenden Dissertation ist es, unser Verständnis von der Funktionsweise dieser Relationen zu verbessern. Dazu untersucht die Arbeit den Einfluss eines stärkeren linguo-palatalen Kontakts auf die Fähigkeit der Sprecher, mehrere Kompensationsstrategien bei der Produktion von Vokalen und Frikativen gleichzeitig anzuwenden. Bei der Datenanalyse wird sowohl das individuelle als auch das durchschnittliche Kompensationsverhalten der Sprecher mittels verallgemeinerter additiver gemischter Modelle (GAMM) sowie eines überwachten Klassifizierungsalgorithmus (Random Forest) untersucht. Dabei wird ein Rahmenwerk entwickelt, das erlaubt das Ausmaß der spektralen Anpassungen bei Vokalen und Frikativen zu untersuchen und miteinander zu vergleichen. Die experimentellen Ergebnisse werden im Rahmen aktueller Sprachproduktionstheorien diskutiert und stimmen insgesamt mit der Vorstellung überein, dass Sprachlaute perzeptuell-motorische Einheiten sind, denen Artikulationsbewegungen zu Grunde liegen, die durch perzeptuelle Eigenschaften beeinflusst und geformt werden
Concatenative speech synthesis: a Framework for Reducing Perceived Distortion when using the TD-PSOLA Algorithm
This thesis presents the design and evaluation of an approach to concatenative speech synthesis using the Titne-Domain Pitch-Synchronous OverLap-Add (I'D-PSOLA) signal processing algorithm. Concatenative synthesis systems make use of pre-recorded speech segments stored in a speech corpus. At synthesis time, the `best' segments available to synthesise the new utterances are chosen from the corpus using a process known as unit selection. During the synthesis process, the pitch and duration of these segments may be modified to generate the desired prosody. The
TD-PSOLA algorithm provides an efficient and essentially successful solution to perform these modifications, although some perceptible distortion, in the form of `buzzyness', may be introduced into the speech signal.
Despite the popularity of the TD-PSOLA algorithm, little formal research has been undertaken to address this recognised problem of distortion. The approach in the thesis has been developed towards reducing the perceived distortion that is introduced when TD-PSOLA is applied to
speech. To investigate the occurrence of this distortion, a psychoacoustic evaluation of the effect of pitch modification using the TD-PSOLA algorithm is presented. Subjective experiments in the form of a set of listening tests were undertaken using word-level stimuli that had been manipulated using TD-PSOLA. The data collected from these experiments were analysed for patterns of co-
occurrence or correlations to investigate where this distortion may occur. From this, parameters were identified which may have contributed to increased distortion. These
parameters were concerned with the relationship between the spectral content of individual phonemes, the extent of pitch manipulation, and aspects of the original recordings.
Based on these results, a framework was designed for use in conjunction with TD-PSOLA to minimise the possible causes of distortion. The framework consisted of a novel speech corpus design, a signal processing distortion measure, and a selection process for especially problematic phonemes. Rather than phonetically balanced, the corpus is balanced to the needs of the signal processing algorithm, containing more of the adversely affected phonemes. The aim is to reduce the potential extent of pitch modification of such segments, and hence produce synthetic speech with less perceptible distortion. The signal processingdistortion measure was developed to allow the prediction of perceptible distortion in pitch-modified speech. Different weightings were estimated for individual phonemes,trained using the experimental data collected during the listening tests.The potential benefit of such a measure for existing unit selection processes in a corpus-based system using
TD-PSOLA is illustrated. Finally, the special-case selection process was developed for highly problematic voiced fricative phonemes to minimise the occurrence of perceived distortion in these segments. The success of the framework, in terms of generating synthetic speech with reduced distortion, was evaluated. A listening test showed that the TD-PSOLA balanced speech corpus may be capable of generating pitch-modified synthetic sentences with significantly less distortion than those generated using a typical phonetically balanced corpus. The voiced fricative selection process was also shown to produce pitch-modified versions of these phonemes with less perceived distortion than a standard selection process. The listening test then indicated that the signal processing distortion measure was able to predict the resulting amount of distortion at the
sentence-level after the application of TD-PSOLA, suggesting that it may be beneficial to include such a measure in existing unit selection processes. The framework was found to be capable of producing speech with reduced perceptible distortion in certain situations, although the effects seen at the sentence-level were less than those seen in the previous investigative experiments that made use of word-level stimuli. This suggeststhat the effect of the TD-PSOLA algorithm cannot always be easily anticipated due to the highly dynamic nature of speech, and that the reduction of perceptible distortion in TD-PSOLA-modified speech remains a challenge to the speech community
It’s All About Context: Investigating the Effects of Consonant and Vowel Environment on Vowel-Evoked Envelope Following Responses
The envelope following response (EFR) has proven useful for studying brainstem speech processing. Previous work, however, demonstrates that its amplitude varies across stimuli. This thesis investigates whether this variation is attributable to the consonant or vowel context of the stimulus, or some interaction of the two. Experiment 1 evoked EFRs in 30 participants using seven English vowels embedded in four CVC environments. A strong effect of vowel and a minor effect of consonant on EFR amplitude were found. In Experiment 2, 64 listeners heard four different tokens of one of four possible English vowels (16 participants/vowel), embedded in the same CVC environments as before. A significant three-way interaction between vowel, vowel trial, and consonant was found, indicating that the EFR is highly sensitive to subtle acoustic differences in stimuli. To effectively utilize the EFR in research, future studies should carefully explore the mechanisms driving these complex context effects
Neural Attractors and Phonological Grammar
This volume collects three articles which constitute the bulk of my PhD research. The overarching theme of the volume is the role of attractors - a concept from dynamical systems theory – in the neural realization of phonological grammar.
The motivation for this line of inquiry begins with the claim that the study of language should provide some insight into the workings of the human mind/brain. Indeed this is one of few mantras shared by linguists of the seemingly irreconcilable “Generative” and “Cognitive” schools (e.g. Chomsky 2002; Lakoff 1988). Given this apparent consensus then, it is perhaps surprising that no breakthrough in our understanding of the brain can yet be attributed to some insight from the study of language.
An analysis and critique of this state of affairs is given by Poeppel & Embick (2005), who identify (amongst other things) that we currently have no way of relating the ontologies of linguistics and neuroscience. This Ontological Incommensurability Problem (OIP) can be resolved, they argue, by the use of a Linking Hypothesis, which spells out linguistic computations at the relevant level of algorithmic abstraction, such that the neuroscientist need only find the exact implementations of those algorithms in the brain. If such a hypothesis were sufficiently complete then it could, in principle, predict the kinds of neural configurations required for natural language processing, using linguistic theories as their starting point. In this way, we could finally realize the long sought-after goal of cashing in theories of language for understanding of the human brain. Simultaneously, a Linking Hypothesis also has the potential to unearth lower-level explanations for linguistic phenomena, for example where those explanations might depend on purely neurobiological notions (e.g. neuronal morphology, synaptic density, metabolic efficiency, etc.)
Comparing malleability of phonetic category between [i] and [u]
This study reports differential category retuning effect between [i] and [u]. Two groups of American listeners were exposed to ambiguous vowels ([i/u]) within words that index a phoneme /i/ (e.g., athl[i/u]t) (i-group) or /u/ (e.g., aftern[i/u]n) (u-group). Before and after the exposure these listeners categorized sounds from a [bip]-[bup] continuum. The i-group significantly increased /bip/ responses after exposure, but the u-group did not change their responses significantly. These results suggest that the way mental representation handles phonetic variation may influence malleability of each category, highlighting the complex relationship among distribution of sounds, their mental representation, and speech perception
Tones in Zhangzhou: Pitch and Beyond
This study draws on various approaches—field linguistics;
auditory and acoustic phonetics; and statistics—to explore and
explain the nature of Zhangzhou tones, an under-described
Southern Min variety. Several original findings emerged from the
analyses of the data from 21 speakers. The realisations of
Zhangzhou tones are multidimensional. The single parameter of
pitch/F0 is not sufficient to characterise tonal contrasts in
either monosyllabic or polysyllabic settings in Zhangzhou.
Instead, various parameters, including pitch/F0, duration, vowel
quality, voice quality, and syllable coda type, interact in a
complicated but consistent way to code tonal distinctions.
Zhangzhou has eight tones rather than seven tones as proposed in
previous studies. This finding resulted from examining the
realisations of diverse parameters across three different
contexts—isolation, phrase-initial, and phrase-final—, rather
than classifying tones in citation and in terms of the
preservation of Middle Chinese tonal categories. Tonal contrasts
in Zhangzhou can be neutralised across different linguistic
contexts. Identifying the number of tonal contrasts based simply
on tonal realisations in the citation environment is not
sufficient. Instead, examining tonal realisations across
different linguistic contexts beyond monosyllables is imperative
for understanding the nature of tone.
Tone sandhi in Zhangzhou is syntactically relevant. The tone
sandhi domain is not phonologically determined but rather is
aligned with a syntactic phrase XP. Within a given XP, the
realisations of the tones at non-phrase-final positions undergo
alternation phonologically and phonetically. Nevertheless, the
alterations are sensitive only to the phrase boundaries and are
not affected by the internal structure of syntactic phrases.
Tone sandhi in Zhangzhou is phonologically inert but phonetically
sensitive. The realisations of Zhangzhou tones in disyllabic
phrases are not categorically affected by their surrounding tones
but are phonetically sensitive to surrounding environments. For
instance, the pitch/F0 onsets of phrase-final tones are largely
sensitive to pitch/F0 offsets of preceding tones and appear to
have diverse variants.
The mappings between Zhangzhou citation and disyllabic tones are
morphologically conditioned. Phrase-initial tones are largely not
related to the citation tones at either the phonological or the
phonetic level while phrase-final tones are categorically related
to the citation tones but phonetically are not quite the same
because of predictable sensitivity to surrounding environments.
Each tone in Zhangzhou can be regarded as a single morpheme
having two alternating allomorphs (tonemes), one for
non-phrase-final variants and one for variants in citation and
phrase-final contexts, both of which are listed in the mental
lexicon of native Zhangzhou speakers but are phonetically distant
on the surface.
In summary, the realisations of Zhangzhou tones are
multidimensional, involving a variety of segmental and
suprasegmental parameters. The interactions of Zhangzhou tones
are complicated, involving phonetics, phonology, syntax, and
morphology. Neutralisation of Zhangzhou tonal contrasts occurs
across different contexts, including citation, phrase-final, and
non-phrase-final. Thus, researchers must go beyond pitch to
understand tone thoroughly as a phenomenon in Southern Min
- …