9 research outputs found
Speaker identity indicators in the domain of the temporal modulation of the speech signal
AbstraktAbstraktAbstraktAbstrakt Tato diplomová práce se zabývá rozpoznáváním mluvčího, a to konkrétně v oblasti temporálních změn v řečovém signálu. Po krátkém úvodu do forenzní fonetiky podává přehled přístupů a faktorů, které napomáhají nebo naopak zabraňují úspěšnému rozpoznání. Následně jsou představeny současné přístupy k temporální struktuře řeči a především k metodám její analýzy. Praktickou část práce pak tvoří experiment, který zjišťuje přínos některých temporálních ukazatelů k rozpoznávání mluvčího. Tyto ukazatele jsou %V (poměrné zastoupení vokalických intervalů ve větě), ΔV a ΔC (směrodatná odchylka vokalických, respektive konsonantických intervalů v rámci věty), VarcoV a VarcoC (normalizace předchozích ukazatelů vzhledem k průměrnému trvání daných intervalů) a indexy párové variability (PVI) pro vokalické i konsonantické intervaly, normalizované i nenormalizované. Kromě toho je k zachycení lokálních změn tempa a obzvláště závěrového zpomalování použit ukazatel LAR (převrácená hodnota vzdálenosti středů dvou následujících vokalických intervalů). Zatímco první zmíněné ukazatele nejsou v rozlišení mluvčích příliš úspěšné, LAR se zdá být velmi dobrým nástrojem pro zachycení individuálních rysů mluvčích. Pro praktické využití tohoto ukazatele bude ale potřeba další výzkum, zejména na větším vzorku...AbstractAbstractAbstractAbstract This diploma thesis aims to contribute to the field of speaker recognition in the domain of temporal changes in the speech signal. After a brief introduction into forensic phonetics, it gives an outline of approaches and factors which help or hinder successful recognition. The focus is then shifted to the temporal structure of speech and approaches to its analysis currently in use. The practical section of this thesis consists of an experiment designed to assess the contribution of certain temporal measures to speaker recognition. The variables used here are %V (the proportion of vocalic intervals within a sentence), ΔV and ΔC (the standard deviation of the duration of vocalic/consonantal intervals within a sentence), VarcoV and VarcoC (the previous variables normalised for average interval duration) and the Pairwise Variability Indices, both vocalic and consonantal, raw and normalised. Beside these, another variable is used to capture the local articulation rate and especially final deceleration in the utterances - LAR (the inverse of the distance between successive midpoints of the vocalic intervals). Whereas the first mentioned variables are not very successful in distinguishing the speakers, LAR seems very well suited for capturing speaker idiosyncrasies, although...Institute of PhoneticsFonetický ústavFaculty of ArtsFilozofická fakult
Acoustic Correlates of Word Stress as A Cue to Accent Strength
Due to the clear interference of their mother tongue prosody, many Czech learners produce their English with a conspicuous foreign accent. The goal of the present study is to investigate the acoustic cues that differentiate stressed and unstressed syllabic nuclei and identify individual details concerning their contribution to the specific sound of Czech English. Speech production of sixteen female non-professional Czech and British speakers was analysed with the sounds segmented on a word and phone level and with both canonical and actual stress positions manually marked. Prior to analyses the strength of the foreign accent was assessed in a perception test. Subsequently, stressed and unstressed vowels were measured with respect to their duration, amplitude, fundamental frequency and spectral slope. Our results show that, in general, Czech speakers use much less acoustic marking of stress than the British subjects. The difference is most prominent in the domains of fundamental frequency and amplitude. The Czech speakers also deviate from the canonical placement of stress, shifting it frequently to the first syllable. On the other hand, they seem to approximate the needed durational difference quite successfully. These outcomes support the concept of language interference since they correspond with the existing linguistic knowledge about Czech and English word stress. The study adds specific details concerning the extent of this interference in four acoustic dimensions
Spectral Characteristics of Schwa in Czech Accented English
The English central mid lax vowel (i.e., schwa) often contributes considerably to the sound differences between native and non-native speech. Many foreign speakers of English fail to reduce certain underlying vowels to schwa, which, on the suprasegmental level of description, affects the perceived rhythm of their speech. However, the problem of capturing quantitatively the differences between native and non-native schwa poses difficulties that, to this day, have been tackled only partially. We offer a technique of measurement in the acoustic domain that has not been probed properly as yet: the distribution of acoustic energy in the vowel spectrum. Our results show that spectral slope features measured in weak vowels discriminate between Czech and British speakers of English quite reliably. Moreover, the measurements of formant bandwidths turned out to be useful for the same task, albeit less direc
Speech Melody Properties in English, Czech and Czech English: Reference and Interference
Two major objectives were set for the present study: to provide reference data for the description of Czech and English F0 contours, and to investigate the limits of the ‘interference hypothesis’ on Czech English data. Altogether, the production of 40 speakers in 2392 breath-group F0 contours was analyzed. The speech of 32 professional speakers of English and Czech provides reference values for various acoustic correlates of pitch level, pitch span and downtrend gradient. These values were subsequently used as a benchmark for a confirmation of the interference hypothesis through comparison with a further sample of 8 non-professional speakers of English and Czech-accented English. The native English speakers of both genders produced significantly higher pitch level indicators, wider pitch span and a steeper downtrend gradient than the reference native speakers of Czech. Although the pitch level of the Czech-accented material lies in between the two reference groups, the pitch span of this group is the narrowest, which indicates that factors of foreign-accentedness other than simply interference are in effect
Spectral Measurements of Vowels for Speaker Identification in Czech
The expansion of telecommunication increased the availability of speech recordings which can be used in criminal investigations. Forensic science is a multidisciplinary approach that provides scientific grounds for assessing the evidence in such investigations. Forensic phonetics explores segmental (vocalic, consonantal) and suprasegmental (prosodic) speech parameters that are discriminant among speakers. There is, however, a gap between technical data‑driven and linguistically informed approaches, which we attempt to bridge in this study by examining Czech vowels through rigorous computational means. Seven different methods of quantifying vocalic spectral slope were compared for the purposes of speaker identification. In forensics, the use of spectral slope is mainly limited to the long‑term average spectra, which are easy to obtain, but have some serious drawbacks. Therefore, in this study, short‑term spectra of Czech vowels were used: although their extraction is more laborious, they provide more speaker‑specific information. Of the seven methods tested, two software predefined functions performed unsatisfactorily, while a combination of modified band density difference and band density ratio was able to differentiate among all of our speakers. The effect of vowel quality on these measures was also investigated.Vzestup telekomunikačních technologií v současné době umožňuje častější využití řečových nahrávek při vyšetřování trestných činů. Forenzní věda je multidisciplinární obor, který poskytuje vědeckou bázi pro posuzování důkazního materiálu během těchto vyšetřování. Forenzní fonetika se zabývá segmentálními (vokalickými a konsonantickými) a suprasegmentálními (prozodickými) řečovými rysy, které mohou odlišovat jednotlivé mluvčí. V tomto ohledu se nicméně rozšiřuje propast mezi technicky a lingvisticky orientovanými přístupy — tato studie je pokusem o její překlenutí zkoumáním českých vokálů rigorózními komputačními přístupy: pro účely rozpoznávání mluvčího ve forenzní praxi je zde porovnáno sedm metod stanovení vokalického spektrálního sklonu. Ve forenzní fonetice byl dosud spektrální sklon používán zejména při měření dlouhodobých průměrných spekter. Tato spektra se snadno získávají, avšak vykazují několik podstatných omezení. Zde jsou tedy využita krátkodobá spektra českých krátkých vokálů, jež přinášejí větší množství charakteristik specifických pro mluvčího, ale jejich extrakce je pracnější. Ze sedmi testovaných metod se softwarem předdefinované funkce ukázaly jako nevyhovující, zatímco kombinace modifikovaného rozdílu hustot pásem a poměru hustot pásem od sebe dokázala odlišit všechny mluvčí. Dále byl také prozkoumán vliv kvality vokálů na výsledky jednotlivých měření.213
Speaker identification in the temporal domain of speech
This thesis aims to thoroughly describe the temporal characteristics of spoken Czech by means of phone durations and their changes under the influence of several prosodic and segmental factors, such as position in a higher unit (syllable, word or prosodic phrase), length of the higher unit, segmental environment, structure of the syllable or phrase-final lengthening. The speech material comes from a semi-spontaneous corpus of scripted dialogues comprising 4046 utterances by 34 speakers. The descriptions are afterwards used for the creation of a rule-based temporal model, which provides a baseline for analysing local articulation rate contours and their speaker-specificity. The results indicate, that systematic speaker-specific differences can be found in the segmental domain, as well as in the temporal contours. Moreover, speaker identification potential of articulation rate and global temporal features is also assessed. Keywords: temporal characteristics, temporal modelling, phone duration, speaker identification, Czec
Speaker identity indicators in the domain of the temporal modulation of the speech signal
AbstractAbstractAbstractAbstract This diploma thesis aims to contribute to the field of speaker recognition in the domain of temporal changes in the speech signal. After a brief introduction into forensic phonetics, it gives an outline of approaches and factors which help or hinder successful recognition. The focus is then shifted to the temporal structure of speech and approaches to its analysis currently in use. The practical section of this thesis consists of an experiment designed to assess the contribution of certain temporal measures to speaker recognition. The variables used here are %V (the proportion of vocalic intervals within a sentence), ΔV and ΔC (the standard deviation of the duration of vocalic/consonantal intervals within a sentence), VarcoV and VarcoC (the previous variables normalised for average interval duration) and the Pairwise Variability Indices, both vocalic and consonantal, raw and normalised. Beside these, another variable is used to capture the local articulation rate and especially final deceleration in the utterances - LAR (the inverse of the distance between successive midpoints of the vocalic intervals). Whereas the first mentioned variables are not very successful in distinguishing the speakers, LAR seems very well suited for capturing speaker idiosyncrasies, although..