1,329 research outputs found

    Speaker Identification for Swiss German with Spectral and Rhythm Features

    Get PDF
    We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity

    Speech rhythm: a metaphor?

    Get PDF
    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and that it is this analogical process which allows speech to be matched to external rhythms

    Dynamics of short-term cross-dialectal accommodation. A study on Grison and Zurich German

    Full text link
    This study investigates whether rhythmic features are object of accommodation between Grison and Zurich German (henceforth GRG and ZHG) speakers, insomuch as it was previously observed for vowel formants. Cross-dialectal rhythmic accommodation and its evoking/inhibiting factors (e.g., acoustic distance vs dialect markedness, new vs previously heard words) were examined in a corpus of pre-and post-dialogue recordings, performed by 18 pairs of GRG and ZHG speakers. Three rhythmic measures were designed which were based on cross-dialectal timing differences related to intervocalic sonorants gemination, open syllable lengthening and reduction of word-final vowels

    Listeners use temporal information to identify French- and English-accented speech

    Get PDF
    Which acoustic cues can be used by listeners to identify speakers’ linguistic origins in foreign-accented speech? We investigated accent identification performance in signal-manipulated speech, where (a) Swiss German listeners heard native German speech to which we transplanted segment durations of French-accented German and English-accented German, and (b) Swiss German listeners heard 6-band noise-vocoded French-accented and English-accented German speech to which we transplanted native German segment durations. Therefore, the foreign accent cues in the stimuli consisted of only temporal information (in a) and only strongly degraded spectral information (in b). Findings suggest that listeners were able to identify the linguistic origin of French and English speakers in their foreign-accented German speech based on temporal features alone, as well as based on strongly degraded spectral features alone. When comparing these results to previous research, we found an additive trend of temporal and spectral cues: identification performance tended to be higher when both cues were present in the signal. Acoustic measures of temporal variability could not easily explain the perceptual results. However, listeners were drawn towards some of the native German segmental cues in condition (a), which biased responses towards ‘French’ when stimuli featured uvular /r/s and towards ‘English’ when they contained vocalized /r/s or lacked /r/

    CIVIL Corpus: Voice Quality for Speaker Forensic Comparison

    Get PDF
    AbstractThe most frequent way in which criminals disguise their voices implies changes in phonation types, but it is difficult to maintain them for a long time. This mechanism severely hampers identification. Currently, the CIVIL corpus comprises 60 Spanish speakers. Each subject performs three tasks: spontaneous conversation, carrier sentences and reading, using modal, falsetto and creak(y) phonation. Two different recording sessions, one month apart, were conducted for each speaker, who was recorded with microphone, telephone and electroglottography. This is the first (open-access) corpus of disguised voices in Spanish. Its main purpose is finding biometric traces that remain in voice despite disguise

    Mothers Reveal More of Their Vocal Identity When Talking to Infants

    Full text link
    Voice timbre – the unique acoustic information in a voice by which its speaker can be recognized – is particularly critical in mother-infant interaction. Correct identification of vocal timbre is necessary in order for infants to recognize their mothers as familiar both before and after birth, providing a basis for social bonding between infant and mother. The exact mechanisms underlying infant voice recognition remain ambiguous and have predominantly been studied in terms of cognitive voice recognition abilities of the infant. Here, we show – for the first time – that caregivers actively maximize their chances of being correctly recognized by presenting more details of their vocal timbre through adjustments to their voices known as infant-directed speech (IDS) or baby talk, a vocal register which is wide-spread through most of the world’s cultures. Using acoustic modelling (k-means clustering of Mel Frequency Cepstral Coefficients) of IDS in comparison with adult-directed speech (ADS), we found in two cohorts of speakers - US English and Swiss German mothers - that voice timbre clusters of in IDS are significantly larger to comparable clusters in ADS. This effect leads to a more detailed representation of timbre in IDS with subsequent benefits for recognition. Critically, an automatic speaker identification using a Gaussian-mixture model based on Mel Frequency Cepstral Coefficients showed significantly better performance in two experiments when trained with IDS as opposed to ADS. We argue that IDS has evolved as part of an adaptive set of evolutionary strategies that serve to promote indexical signalling by caregivers to their offspring which thereby promote social bonding via voice and acquiring linguistic systems

    Un estudio del corpus de medidas de duración rítmica del dialecto Kalhori del Kurdo

    Get PDF
    In order to identify between-sentence and between-speaker variabilities, one of the methods used by phoneticians is studying durational rhythmic features. In the present research, to classify speech rhythm of Kalhori, a variety of Kurdish, and to find out about the most appropriate measures for between-sentence and between-speaker rhythmic variability in Kalhori, durational speech rhythmic measures were analyzed. To this end, two speaking styles (read and spontaneous) were explored. The analysis of the read corpus revealed that Kalhori Kurdish rhythm pattern is between stress-timed and syllable-timed. The results indicated that %V (proportion over which speech is vocalic) was the most significant measure for distinguishing between-sentence rhythmic variability in the read corpus, while %V and rateSyl (syllable rate) were the most efficient measures for identifying the between-speaker rhythmic variability in both the read and spontaneous corpus.Uno de los métodos empleados en fonética para identificar la variabilidad entre oraciones y hablantes es el estudio de las características rítmicas. En este estudio, se han analizado algunas métricas temporales de ritmo en kalhori (una variedad del kurdo) para descubrir las que mejor explican la variabilidad rítmica entre oraciones y entre hablantes. Con este fin, se han utilizado dos estilos de habla: lectura y habla espontánea. El análisis del corpus de lectura demostró que el tipo de ritmo del kurdo kalhori se puede situar en el medio del continuo entre lenguas de ritmo acentual y lenguas de ritmo silábico. Los resultados indican que la métrica más adecuada para explicar la variabilidad rítmica entre oraciones en el corpus leído fue %V (proporción de vocales sobre el total de habla), mientras que %V y rateSyl (número de sílabas pronunciadas por minuto) fueron las métricas más eficientes para identificar la variabilidad rítmica entre hablantes, tanto en el corpus leído como en el espontáneo
    corecore