47 research outputs found
NORMALIZACIJA VOKALSKIH FORMANATA U HRVATSKOME I SRPSKOME JEZIKU
The aim of this study was to compare the results of a traditional formant analysis of vowels with the results of normalization systems on the example of Croatian and Serbian speech. Male native speakers of Croatian and Serbian were used for this study (N=92). Traditional results of formant analyses express differences among analysed groups of speakers caused by linguistic, sociolinguistic, but also physiological factors. Considering that the values of formant vowels are influenced by many factors, including idiosyncratic physiological characteristics of the vocal tract, normalization approaches remove those variables among speakers that are caused by mutual physiological differences. Therefore, the dialectal, inter-linguistic and/or sociolinguistic differences among speakers whose speech is being analysed are isolated in a scientifically more objective way. The results of this study have shown that formant values are more grouped together and centralized (especially in vowels [a] and [i]), than in non-normalized results within each language individually. This contrastive analysis has shown that in Croatian [i], [o] and [u] are more closed and frontal, the vowel [a] is more closed and back, and the vowel [e] is more open and front, in relation to the vowels in Serbian. This study exemplifies the advantage of normalization systems in the interpretation of acoustic results. Svrha je ovoga istraĹľivanja bila usporediti rezultate tradicionalne formantske analize vokala s rezultatima normalizacijskih sustava, na primjeru hrvatskoga i srpskoga govora. Za potrebe rada analizirani su muški izvorni govornici hrvatskoga i srpskoga jezika (N=92). S obzirom na to da na vrijednosti formanata vokala utjeÄŤu brojni faktori, izmeÄ‘u kojih i idiosinkratiÄŤke fiziološke karakteristike govornika, normalizacijom se uklanja varijabilnost meÄ‘u govornicima uzrokovana njihovom fiziološkom razliÄŤitošću. Normalizacijom vrijednosti formanata utvrÄ‘en je viši stupanj centralizacije svih vokala obaju jezika u usporedbi s ne-normaliziranim vrijednostima formanata, dok je kontrastivna analiza meÄ‘u jezicima ukazala na razlike u obiljeĹľjima prednjosti i straĹľnjosti te otvorenosti i zatvorenosti kod svih vokala.
A Comparative Study of Spectral Peaks Versus Global Spectral Shape as Invariant Acoustic Cues for Vowels
The primary objective of this study was to compare two sets of vowel spectral features, formants and global spectral shape parameters, as invariant acoustic cues to vowel identity. Both automatic vowel recognition experiments and perceptual experiments were performed to evaluate these two feature sets. First, these features were compared using the static spectrum sampled in the middle of each steady-state vowel versus features based on dynamic spectra. Second, the role of dynamic and contextual information was investigated in terms of improvements in automatic vowel classification rates. Third, several speaker normalizing methods were examined for each of the feature sets. Finally, perceptual experiments were performed to determine whether vowel perception is more correlated with formants or global spectral shape.
Results of the automatic vowel classification experiments indicate that global spectral shape features contain more information than do formants. For both feature sets, dynamic features are superior to static features. Spectral features spanning a time interval beginning with the start of the on-glide region of the acoustic vowel segment and ending at the end of the off-glide region of the acoustic vowel segment are required for maximum vowel recognition accuracy. Speaker normalization of both static and dynamic features can also be used to improve the automatic vowel recognition accuracy.
Results of the perceptual experiments with synthesized vowel segments indicate that if formants are kept fixed, global spectral shape can, at least for some conditions, be modified such that the synthetic speech token will be perceived according to spectral shape cues rather than formant cues. This result implies that overall spectral shape may be more important perceptually than the spectral prominences represented by the formants.
The results of this research contribute to a fundamental understanding of the information-encoding process in speech. The signal processing techniques used and the acoustic features found in this study can also be used to improve the preprocessing of acoustic signals in the front-end of automatic speech recognition systems
Envelhecimento vocal: estudo acústico-articulatório das alterações de fala com a idade
Background: Although the aging process causes specific alterations in the
speech organs, the knowledge about the age effects in speech production is still
disperse and incomplete. Objective: To provide a broader view of the age-related
segmental and suprasegmental speech changes in European Portuguese (EP),
considering new aspects besides static acoustic features, such as dynamic and
articulatory data. Method: Two databases, with speech data of Portuguese
adult native speakers obtained through standardized recording and segmentation
procedures, were devised: i) an acoustic database containing all EP oral
vowels produced in similar context (reading speech), and also a sample of semispontaneous
speech (image description) collected from a large sample of adults
between the ages 35 and 97; ii) and another with articulatory data (ultrasound
(US) tongue images synchronized with speech) for all EP oral vowels produced in
similar contexts (pseudowords and isolated) collected from young ([21-35]) and
older ([55-73]) adults. Results: Based on the curated databases, various aspects
of the aging speech were analyzed. Acoustically, the aging speech is characterized
by: 1) longer vowels (in both genders); 2) a tendency for F0 to decrease
in women and slightly increase in men; 3) lower vowel formant frequencies in
females; 4) a significant reduction of the vowel acoustic space in men; 5) vowels
with higher trajectory slope of F1 (in both genders); 6) shorter descriptions with
higher pause time for males; 7) faster speech and articulation rate for females;
and 8) lower HNR for females in semi-spontaneous speech. In addition, the total
speech duration decrease is associated to non-severe depression symptoms and
age. Older adults tended to present more depressive symptoms that could impact
the amount of speech produced. Concerning the articulatory data, the tongue
tends to be higher and more advanced with aging for almost all vowels, meaning
that the vowel articulatory space tends to be higher, advanced, and bigger in older
females. Conclusion: This study provides new information on aging speech for
a language other than English. These results corroborate that speech changes
with age and present different patterns between genders, and also suggest that
speakers might develop specific articulatory adjustments with aging.Contextualização: Embora o processo de envelhecimento cause alterações
especĂficas no sistema de produção de fala, o conhecimento sobre os efeitos da
idade na fala Ă© ainda disperso e incompleto. Objetivo: Proporcionar uma visĂŁo
mais ampla das alterações segmentais e suprassegmentais da fala relacionadas
com a idade no Português Europeu (PE), considerando outros aspetos, para além
das caracterĂsticas acĂşsticas estáticas, tais como dados dinâmicos e articulatĂłrios.
MĂ©todo: Foram criadas duas bases de dados, com dados de fala de adultos
nativos do PE, obtidos através de procedimentos padronizados de gravação e
segmentação: i) uma base de dados acústica contendo todas as vogais orais do
PE em contexto semelhante (leitura de palavras), e também uma amostra de fala
semiespontânea (descrição de imagem) produzidas por uma larga amostra de
indivĂduos entre os 35 e os 97 anos; ii) e outra com dados articulatĂłrios (imagens
de ultrassom da lĂngua sincronizadas com o sinal acĂşstico) de todas as vogais
orais do PE produzidas em contextos semelhantes (pseudopalavras e palavras
isoladas) por adultos de duas faixas etárias ([21-35] e [55-73]). Resultados:
Tendo em conta as bases de dados curadas, foi analisado o efeito da idade em
diversas caracterĂsticas da fala. Acusticamente, a fala de pessoas mais velhas Ă©
caracterizada por: 1) vogais mais longas (ambos os sexos); 2) tendĂŞncia para
F0 diminuir nas mulheres e aumentar ligeiramente nos homens; 3) diminuição
da frequência dos formantes das vogais nas mulheres; 4) redução significativa
do espaço acústico das vogais nos homens; 5) vogais com maior inclinação da
trajetória de F1 (ambos os sexos); 6) descrições mais curtas e com maior tempo
de pausa nos homens; 7) aumento da velocidade articulatĂłria e da velocidade de
fala nas mulheres; e 8) diminuição do HNR na fala semiespontânea em mulheres.
Além disso, os idosos tendem a apresentar mais sintomas depressivos que podem
afetar a quantidade de fala produzida. Em relação aos dados articulatórios, a
lĂngua tende a apresentar-se mais alta e avançada em quase todas as vogais com
a idade, ou seja o espaço articulatório das vogais tende a ser maior, mais alto
e avançado nas mulheres mais velhas. Conclusão: Este estudo fornece novos
dados sobre o efeito da idade na fala para uma lĂngua diferente do inglĂŞs. Os
resultados corroboram que a fala sofre alterações com a idade, que diferem em
função do género, sugerindo ainda que os falantes podem desenvolver ajustes
articulatĂłrios especĂficos com a idade.Programa Doutoral em Gerontologia e Geriatri
Vowel normalisation : an interface between acoustic and linguistic descriptions of speaker characteristics in Australian English
This thesis examines existing normalisation procedures against the background
of a theoretical model of inter-speaker formant variability, which
describes observed formant differences in three major categories: phonetic
variation, non-uniform variation, and uniform variation. A new
normalisation strategy based on this model is proposed which involves
the removal of uniform and non-uniform components of inter-speaker
variation in order to isolate phonetic variation. The nature of this nonuniformity
is subject to empirical investigation. Working along the above
strategy, the method adopted in this thesis is to initially acquire a phonetically
stable vowel database, which is then screened for phonetic variations
through a rigorous phonetic control procedure. The resulting
data, now considered to be phonetically homogeneous, are used for exploring
two essential domains of inter-speaker variability that contribute
to the designing of a future normalisation procedure: (1) By applying
uniform transformations using a variety of published scaling parameters,
the most effective uniform scaling parameters are identified. (2)
Non-uniform inter-speaker variation patterns are analysed and compared
with the published results of Fant (1975). A major discovery is that
non-uniform inter-speaker variation patterns obtained from phonetically
controlled data are grossly different from those observed by Fant.
The present database comprises 594 vowels in the /h_d/ word context
(11 phonemic monophthongs x 9 speakers x 6 repetitions), and the speakers
include 4 adult females, 3 adult males and 2 children (male)
Modelling phonologization: vowel reduction and epenthesis in Lunigiana dialects
Building upon wave-theoretic
assumptions, this dissertation provides a formal description of the
relationship between diatopic/diachronic micro-variation and
phonologization. In particular, an analysis is performed of the
phonetic/phonological properties of unstressed vowel reduction and vowel
insertion in two Northern Italian dialects: Carrarese and Pontremolese.
These dialects are argued to represent two frozen stages of these
processes’ diffusion, Carrarese representing the diachronic stage
Pontremolese has already gone through. Indeed, Pontremolese displays
non-etymological vocoids that show the phonetic and phonological
characteristics of epenthetic vowels and that, crucially, can be
considered the phonologized correlates of Carrarese’s intrusive vocoids.
These, in turn, should be rather considered articulatory/perceptually
driven vowel-like releases. A formal account of this diatopic,
diachronic and grammatical relationship is given that supports a modular
grammar architecture, in which phonetics and phonology constitute two
autonomous modules. Within such an architecture, the lateral forces
(government and licensing) developed by standard Government Phonology
are translated into violable constraints and inserted in a BiPhon
grammar. In this optimality-theoretic grammar, the phonetics-phonology
interface is managed by a set of cue constraints that map acoustic
dimensions (formant structures) onto phonological primitives (elements).
Furthermore, to integrate morphological information in the phonological
forms, the Coloured Containment Theory is resorted to.Language Use in Past and Presen
Modelling phonologization: vowel reduction and epenthesis in Lunigiana dialects
Within a linguistic continuum, the further from the irradiation centre, the later a language is affected by a change; the later a language is reached by a change, the milder the outcomes. Building upon these wave-theoretic assumptions, this dissertation provides a formal description of the relationship between diatopic/diachronic micro-variation and phonologization. In particular, an analysis is performed of the phonetic/phonological properties of unstressed vowel reduction and vowel insertion in two Northern Italian dialects: Carrarese and Pontremolese. These dialects are argued to represent two frozen stages of these processes’ diffusion, Carrarese representing the diachronic stage Pontremolese has already gone through. Indeed, Pontremolese displays non-etymological vocoids that show the phonetic and phonological characteristics of epenthetic vowels and that, crucially, can be considered the phonologized correlates of Carrarese’s intrusive vocoids. These, in turn, should be rather considered articulatory/perceptually driven vowel-like releases. A formal account of this diatopic, diachronic and grammatical relationship is given that supports a modular grammar architecture, in which phonetics and phonology constitute, hence, two autonomous modules. Within such an architecture, the lateral forces (government and licensing) developed by standard Government Phonology are translated into violable constraints and inserted in a BiPhon grammar. In this optimality-theoretic grammar, the phonetics-phonology interface is managed by a set of cue constraints that map acoustic dimensions (formant structures) onto phonological primitives (elements). Furthermore, to integrate morphological information in the phonological forms, the Coloured Containment Theory is resorted to. This dissertation is of relevance to anyone interested in diatopic/diachronic micro-variation, phonologization, phonological theory and Italian dialectology
Combining research methods for an experimental study of West Central Bavarian vowels in adults and children
The overall goal of this thesis was to systematically measure defining vowel characteristics of the West Central Bavarian (WCB) dialect for an acoustically based analysis of the Bavarian vowel system and simultaneously investigate to what extent these characteristics are being preserved across generations and if there is a sound change in progress observable in which young speakers show more characteristics of Standard German (SG) than old on some Bavarian vowel attributes. In order to address these aims we conducted acoustic recordings of WCB speaking adults and WCB speaking primary school children which were then compared to each other with an apparent-time analysis. For a more accurate view of changes in progress we combined this apparent-time comparison with longitudinal data from the WCB children, obtained at annually intervals expanding over three years. The acoustic data was enhanced by articulatory data gained from ultrasound recordings of a subset of the same WCB speaking children at two timepoints with one year interval.
Analyses of the acoustic data revealed both adult/child and longitudinal changes in the direction of the standard in the children’s tendency towards a merger of two open vowels and a collapse of a long/short consonant contrast, neither of which exist in SG. There was some evidence that children in comparison with adults were beginning to develop both tensity and rounding contrasts which occur in SG but not WCB. There were no observed changes to the pattern of opening and closing diphthongs which differ markedly between the two varieties. Also, within the WCB front vowel that resulted historically from /l/-vocalization and for which articulatory data from a subset of the children was put into relation with the acoustic measures no changes were observed.
The general conclusion is that WCB change is most likely to occur as a consequence of exaggerating phonetic variation that already happens to be in the direction of the standard and therefore internal factors motivated by general principles of vowel change might play a more decisive role in inducing a shift than external factors like dialect contact
Short-term accommodation of Hong Kong English speakers towards native English accents and the effect of language attitudes
Accommodation, also known as convergence, refers to a process whereby a speaker changes the way he or she speaks to be more similar to another speaker. This dissertation focuses on two themes: language attitudes and short-term accommodation. A study using the matched-guise method is conducted to examine Hong Kong people’s attitudes towards British English, American English and Hong Kong English (henceforth HKE). Results suggest that after the handover British English is still rated as the most prestigious English variety in Hong Kong. HKE is also found to have a high level of acceptance in terms of social attractiveness.
For short-term accommodation, two studies are conducted to investigate the phonetic convergence of HKE speakers towards native English accents, and the effect of language attitudes on convergence. Study 2 consists of a group of HKE speakers completing separate map tasks with a Received Pronunciation speaker and a General American English speaker. Their pronunciations of the THOUGHT vowel, the PATH vowel, rhoticity, fricative /z/ and fricative /θ/ are examined before, during and after the map tasks. The results suggest that the HKE speakers produce more fricative [z] and converge on rhoticity after exposure to the native accents. However, divergence is found on the PATH vowel and fricative /θ/, and maintenance is found on the THOUGHT vowel. These findings suggest that the HKE speakers tend to converge on the linguistic features which are more salient to them. Study 3 examines the effect of language attitudes on speech convergence, and no correlation is found between language attitudes and the HKE speakers’ convergence on rhoticity.
Finally, the hybrid exemplar-based model is proposed to explain the complex results of the three studies. It provides a framework for speech accommodation which covers speech perception and production, and includes social factors as important elements in the model
Caractéristiques acoustiques des voyelles fermées tendues, relâchées et allongées en français québécois
Tableau d’honneur de la Faculté des études supérieures et postdoctorales, 2013-2014.L’objectif de cette contribution est de décrire acoustiquement les variantes tendues, relâchées et allongées des voyelles fermées /i y u/ en français québécois, qui, sous l’accent, se retrouvent respectivement en syllabe ouverte, en syllabe fermée et en syllabe fermée par une consonne allongeante. 1350 occurrences extraites de la parole de 30 locuteurs de Rouyn-Noranda, de Saguenay et de Québec ont été analysées. Leur durée a été relevée, puis la fréquence fondamentale et la fréquence centrale des trois premiers formants (F1, F2, F3) ont été estimées à 25, 50 et 75 % de cette durée. Les variantes tendues présentent le F1 le plus bas et les relâchées, le F1 le plus élevé ; les allongées se situant entre les deux. En cours d’émission, les tendues et les allongées se tendent, mais les relâchées se centralisent. Les allongées sont celles qui présentent les trajectoires les plus importantes dans un diagramme F1 / F2.This study aims to acoustically describe tense, lax and lengthened variants of close vowels /i y u/ in Quebec French which, under stress, are found in open syllable, closed syllable and syllable closed by a lengthening consonant, respectively. To do so, we analysed the speech of 30 speakers from Rouyn-Noranda, Saguenay and Quebec who produced 1350 tokens of the variants under study. Their duration have been measured then the fundamental frequency and the central frequency of the first three formants (F1, F2, F3) have been estimated at 25, 50 and 75% of this duration. Tense variants exhibit the lowest F1 values while lax variants present the highest ones; the lengthened variants taking place in between. During the emission, lengthened variants show the most important trajectories in an F1 / F2 plane
Caractéristiques acoustiques des voyelles fermées tendues, relâchées et allongées en français québécois
L'objectif de cette contribution est de décrire acoustiquement les variantes tendues, relâchées et allongées des voyelles fermées /i y u/ en français québécois, qui, sous l'accent, se retrouvent respectivement en syllabe ouverte, en syllabe fermée et en syllabe fermée par une consonne allongeante. 1350 occurrences extraites de la parole de 30 locuteurs de Rouyn-Noranda, de Saguenay et de Québec ont été analysées. Leur durée a été relevée, puis la fréquence fondamentale et la fréquence centrale des trois premiers formants (Fi, F2, F3) ont été estimées à 25, 50 et 75 % de cette durée. Les variantes tendues présentent le Fi le plus bas et les relâchées, le Fi le plus élevé; les allongées se situant entre les deux. En cours d'émission, les tendues et les allongées se tendent, mais les relâchées se centralisent. Les allongées sont celles qui présentent les trajectoires les plus importantes dans un diagramme F1 / F2