7,614 research outputs found
Production and perception of Libyan Arabic vowels
PhD ThesisThis study investigates the production and perception of Libyan Arabic (LA)
vowels by native speakers and the relation between these major aspects of speech. The
aim was to provide a detailed acoustic and auditory description of the vowels available in
the LA inventory and to compare the phonetic features of these vowels with those of
other Arabic varieties.
A review of the relevant literature showed that the LA dialect has not been
investigated experimentally. The small number of studies conducted in the last few
decades have been based mainly on impressionistic accounts. This study consists of two
main investigations: one concerned with vowel production and the other with vowel
perception. In terms of production, the study focused on gathering the data necessary to
define the vowel inventory of the dialect and to explore the qualitative and quantitative
characteristics of the vowels contained in this inventory. Twenty native speakers of LA
were recorded while reading target monosyllabic words in carrier sentences. Acoustic and
auditory analyses were used in order to provide a fairly comprehensive and objective
description of the vocalic system of LA. The results showed that phonologically short and
long Arabic vowels vary significantly in quality as well as quantity; a finding which is
increasingly being reported in experimental studies of other Arabic dialects. Short vowels
in LA tend to be more centralised than has been reported for other Arabic vowels,
especially with regards to short /a/. The study also looked at the effect of voicing in
neighbouring consonants and vowel height on vowel duration, and the findings were
compared to those of other varieties/languages.
The perception part of the study explored the extent to which listeners use the
same acoustic cues of length and quality in vowel perception that are evident in their
production. This involved the use of continua from synthesised vowels which varied
along duration and/or formant frequency dimensions. The continua were randomised and
played to 20 native listeners who took part in an identification task. The results show that,
when it comes to perception, Arabic listeners still rely mainly on quantity for the
distinction between phonologically long and short vowels. That is, when presented with
stimuli containing conflicting acoustic cues (formant frequencies that are typical of long
vowels but with short duration or formant frequencies that are typical of short vowels but
with long duration), listeners reacted consistently to duration rather than formant
frequency.
The results of both parts of the study provided some understanding of the LA
vowel system. The production data allowed for a detailed description of the phonetic
characteristics of LA vowels, and the acoustic space that they occupy was compared with
those of other Arabic varieties. The perception data showed that production and
perception do not always go hand in hand and that primary acoustic cues for the
identification of vowels are dialect- and language-specific
Asymmetric discrimination of non-speech tonal analogues of vowels
Published in final edited form as: J Exp Psychol Hum Percept Perform. 2019 February ; 45(2): 285ā300. doi:10.1037/xhp0000603.Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences due to the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with non-speech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally-produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with two-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in one or both of these two acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in non-speech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited.Accepted manuscrip
Forming New Vowel Categories in Second Language Speech: The Case of Polish Learners' Production of English /I/ and /e/
The paper concentrates on formation of L2 English vowel categories in the speech of Polish learners. More specifically, it compares distribution of two English categories - /I/ and /e/ relative to neighbouring Polish vowels. 43 participants recorded Polish and English vowels in a /bVt/ context. First two formants were measured at a vowel midpoint and plotted on a vowel plane. The results reveal that while a separate /I/ category is formed fairly effectively in Polish learners pronunciation of English, a category of /e/ is almost completely subsumed by a Polish vowel /Ļµ
Listeners normalize speech for contextual speech rate even without an explicit recognition task
Speech can be produced at different rates. Listeners take this rate variation into account by normalizing vowel duration for contextual speech rate: An ambiguous Dutch word /m?t/ is perceived as short /mAt/ when embedded in a slow context, but long /ma:t/ in a fast context. Whilst some have argued that this rate normalization involves low-level automatic perceptual processing, there is also evidence that it arises at higher-level cognitive processing stages, such as decision making. Prior research on rate-dependent speech perception has only used explicit recognition tasks to investigate the phenomenon, involving both perceptual processing and decision making. This study tested whether speech rate normalization can be observed without explicit decision making, using a cross-modal repetition priming paradigm. Results show that a fast precursor sentence makes an embedded ambiguous prime (/m?t/) sound (implicitly) more /a:/-like, facilitating lexical access to the long target word "maat" in a (explicit) lexical decision task. This result suggests that rate normalization is automatic, taking place even in the absence of an explicit recognition task. Thus, rate normalization is placed within the realm of everyday spoken conversation, where explicit categorization of ambiguous sounds is rare
Verification of feature regions for stops and fricatives in natural speech
The presence of acoustic cues and their importance in speech perception have
long remained debatable topics. In spite of several studies that exist in this
eld, very little is known about what exactly humans perceive in speech. This
research takes a novel approach towards understanding speech perception. A
new method, named three-dimensional deep search (3DDS), was developed
to explore the perceptual cues of 16 consonant-vowel (CV) syllables, namely
/pa/, /ta/, /ka/, /ba/, /da/, /ga/, /fa/, /Ta/, /sa/, /Sa/, /va/, /Da/, /za/,
/Za/, from naturally produced speech. A veri cation experiment was then
conducted to further verify the ndings of the 3DDS method. For this pur-
pose, the time-frequency coordinate that de nes each CV was ltered out
using the short-time Fourier transform (STFT), and perceptual tests were
then conducted. A comparison between unmodi ed speech sounds and those
without the acoustic cues was made. In most of the cases, the scores dropped
from 100% to chance levels even at 12 dB SNR. This clearly emphasizes the
importance of features in identifying each CV. The results con rm earlier
ndings that stops are characterized by a short-duration burst preceding the
vowel by 10 cs in the unvoiced case, and appearing almost coincident
with the vowel in the voiced case. As has been previously hypothesized,
we con rmed that the F2 transition plays no signi cant role in consonant
identi cation. 3DDS analysis labels the /sa/ and /za/ perceptual features
as an intense frication noise around 4 kHz, preceding the vowel by 15{20
cs, with the /za/ feature being around 5 cs shorter in duration than that
of /sa/; the /Sa/ and /Za/ events are found to be frication energy near 2
kHz, preceding the vowel by 17{20 cs. /fa/ has a relatively weak burst and
frication energy over a wide-band including 2{6 kHz, while /va/ has a cue
in the 1.5 kHz mid-frequency region preceding the vowel by 7{10 cs. New
information is established regarding /Da/ and /Ta/, especially with regards
to the nature of their signi cant confusions
English-learning infantsā perception of word stress patterns
Adult speakers of different free stress languages (e.g., English, Spanish) differ both in their sensitivity to lexical stress and in their processing of suprasegmental and vowel quality cues to stress. In a head-turn preference experiment with a familiarization phase, both 8-month-old and 12-month-old English-learning infants discriminated between initial stress and final stress among lists of Spanish-spoken disyllabic nonwords that were segmentally varied (e.g. [Ėnila, Ėtuli] vs [luĖta, puĖki]). This is evidence that English-learning infants are sensitive to lexical stress patterns, instantiated primarily by suprasegmental cues, during the second half of the first year of life
Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech
Whistled speech is a little studied local use of language shaped by several
cultures of the world either for distant dialogues or for rendering traditional
songs. This practice consists of an emulation of the voice thanks to a simple
modulated pitch. It is therefore the result of a transformation of the vocal
signal that implies simplifications in the frequency domain. The whistlers
adapt their productions to the way each language combines the qualities of
height perceived simultaneously by the human ear in the complex frequency
spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this
practice underlines key acoustic cues for the intelligibility of the concerned
languages. The present study provides an analysis of the acoustic and phonetic
features selected by whistled speech in several traditions either in purely
oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an
instrument like a leaf (Akha, Hmong). It underlines the convergences with the
strategies of the singing voice to reach the audience or to render the phonetic
information carried by the vowel (tone, identity) and some aesthetic effects
like ornamentation
The listening talker: A review of human and algorithmic context-induced modifications of speech
International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output
- ā¦