22 research outputs found
Pausing strategies with regard to speech style
Speech is occasionally interrupted by silent and filled pauses of various length. Pauses have many different functions in spontaneous speech (e.g. breathing, marking syntactic boundaries as well as speech planning difficulties, time for self-repair). The aim of the study was the analysis of the interrelation between the temporal pattern and the syntactical position of silent pauses (SP) on one hand. On the other hand, filled pauses (FP) were also analyzed according to their phonetic realization, as well as the combination of SPs and FPs. The effect of speech style on pausing strategies was also analyzed. A narrative recording and a conversational recording from 10 speakers (ages between 20 and 35 years, 5 male, 5 female) were selected from Hungarian Spontaneous Speech Database for the study. The material was manually annotated, silent pauses were categorized, then the duration of pauses were extracted. Results showed that the position of silent and filled pauses affects their duration. The speech style did not influenced the frequency of pauses. However, silent and filled pauses were longer in narratives than in conversations. Results suggest that pausing strategies are similar in general; however, the timing patterns of pauses may depend on various factors, e.g. speech style
Kriminalisztikai alapú beszélőiprofil-alkotás
A beszĂ©lĹ‘ hangja alapján az ismert szemĂ©lyek felismerĂ©se mellett kĂ©pesek vagyunk az ismeretlen szemĂ©lyekrĹ‘l profilt kĂ©szĂteni, vagyis olyan általános informáciĂłkat becsĂĽlni, mint pĂ©ldául a nem, az Ă©letkor, a testalkat vagy a beszĂ©lĹ‘ hangulata. Korábbi kutatások igazolták, hogy erĹ‘s összefĂĽggĂ©s van a toldalĂ©kcsĹ‘ hossza Ă©s a beszĂ©lĹ‘ szemĂ©ly fizikai állapota, mint az Ă©letkor, a nem, a testmagasság stb. között. Ezen összefĂĽggĂ©s alapján feltĂ©telezzĂĽk, hogy az emberi beszĂ©d akusztikai jellemzĹ‘i kĂłdolják az adott beszĂ©lĹ‘ testi fizikai felĂ©pĂtĂ©sĂ©re utalĂł jegyeket. A jelen kutatásban ezen összefĂĽggĂ©s Ă©rvĂ©nyessĂ©gĂ©t vizsgáljuk tanulĂł algoritmusok segĂtsĂ©gĂ©vel. A kutatásban elemezzĂĽk, hogy a beszĂ©dbĹ‘l milyen eredmĂ©nyessĂ©ggel lehet automatikusan becsĂĽlni a beszĂ©lĹ‘ nemĂ©t, Ă©letkorát, testsĂşlyát, illetve testtömegĂ©t. A fizikai tulajdonságok becslĂ©sĂ©hez a beszĂ©dbĹ‘l kinyert akusztikai jellemzĹ‘ket használunk: prozĂłdiai alapĂş, beszĂ©dminĹ‘sĂ©g-alapĂş, spektrális alapĂş. Az eredmĂ©nyek azt mutatják, hogy a nem, a testtömeg Ă©s a testsĂşly becslĂ©se nagy pontosságĂş, mĂg az Ă©letkor becslĂ©se kevĂ©sbĂ©
Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings
In forensic voice comparison the speaker embedding has become widely popular
in the last 10 years. Most of the pretrained speaker embeddings are trained on
English corpora, because it is easily accessible. Thus, language dependency can
be an important factor in automatic forensic voice comparison, especially when
the target language is linguistically very different. There are numerous
commercial systems available, but their models are mainly trained on a
different language (mostly English) than the target language. In the case of a
low-resource language, developing a corpus for forensic purposes containing
enough speakers to train deep learning models is costly. This study aims to
investigate whether a model pre-trained on English corpus can be used on a
target low-resource language (here, Hungarian), different from the model is
trained on. Also, often multiple samples are not available from the offender
(unknown speaker). Therefore, samples are compared pairwise with and without
speaker enrollment for suspect (known) speakers. Two corpora are applied that
were developed especially for forensic purposes, and a third that is meant for
traditional speaker verification. Two deep learning based speaker embedding
vector extraction methods are used: the x-vector and ECAPA-TDNN. Speaker
verification was evaluated in the likelihood-ratio framework. A comparison is
made between the language combinations (modeling, LR calibration, evaluation).
The results were evaluated by minCllr and EER metrics. It was found that the
model pre-trained on a different language but on a corpus with a huge amount of
speakers performs well on samples with language mismatch. The effect of sample
durations and speaking styles were also examined. It was found that the longer
the duration of the sample in question the better the performance is. Also,
there is no real difference if various speaking styles are applied
Az f0-jellemzők felolvasásban és spontán beszédben
A large number of studies investigated the differences in f0 characteristics between reading aloud (RA) and spontaneous speech (SpS) in various languages. Their basic assumption was that the different production strategies lead to difference in the prosodic features, however, their results were not consistent as to which speech style was realized with higher mean f0 or a larger f0 range. Hungarian data have been only analyzed on small numbers of speakers. Therefore our goals are: (i) to provide a large sample (82 subjects) based comparison of the f0 characteristics of RA and SpS in Hungarian, and (ii) to analyze the individual differences behind general tendencies. Mean f0 and pitch range (of the interpausal units) were higher in RA, while the f0 range in SpS. The interspeaker differences played an important role in the mean f0 results: no speech style characteristic difference was found in women, while this was apparent in men