82,588 research outputs found
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification
For practical automatic speaker verification (ASV) systems, replay attack
poses a true risk. By replaying a pre-recorded speech signal of the genuine
speaker, ASV systems tend to be easily fooled. An effective replay detection
method is therefore highly desirable. In this study, we investigate a major
difficulty in replay detection: the over-fitting problem caused by variability
factors in speech signal. An F-ratio probing tool is proposed and three
variability factors are investigated using this tool: speaker identity, speech
content and playback & recording device. The analysis shows that device is the
most influential factor that contributes the highest over-fitting risk. A
frequency warping approach is studied to alleviate the over-fitting problem, as
verified on the ASV-spoof 2017 database
Weighted LDA techniques for I-vector based speaker verification
This paper introduces the Weighted Linear Discriminant Analysis (WLDA) technique, based upon the weighted pairwise Fisher criterion, for the purposes of improving i-vector speaker verification in the presence of high intersession variability. By taking advantage of the speaker discriminative information that is available in the distances between pairs of speakers clustered in the development i-vector space, the WLDA technique is shown to provide an improvement in speaker verification performance over traditional Linear Discriminant Analysis (LDA) approaches. A similar approach is also taken to extend the recently developed Source Normalised LDA (SNLDA) into Weighted SNLDA (WSNLDA) which, similarly, shows an improvement in speaker verification performance in both matched and mismatched enrolment/verification conditions. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that both WLDA and WSNLDA are viable as replacement techniques to improve the performance of LDA and SNLDA-based i-vector speaker verification
Speaker Identification for Swiss German with Spectral and Rhythm Features
We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity
A cross-linguistic study of between-speaker variability in intensity dynamics in L1 and L2 spontaneous speech
Dynamic aspects of the amplitude envelope appear to reflect speaker-specific information. Intensity dynamics characterized as the temporal displacement of acoustic energy associated to articulatory mouth opening (positive) and closing (negative) gestures was able to explain between-speaker variability in read productions of native speakers of ZĂĽrich German. This study examines positive and negative intensity dynamics in spontaneous speech produced by Dutch speakers using their native language and English. Acoustic analysis of informal monologues was performed to examine between-speaker variability. Negative dynamics explained a larger quantity of inter-speaker variability, strengthening the idea of a lesser prosodic control over the mouth closing movement. Furthermore, there was a significant effect of language on intensity dynamics. These findings suggest that speaker-specific information may still be embedded in these time-bound measures despite the language in use
The relationship between acoustic indices of speech motor control variability and other measures of speech performance in dysarthria
Previous studies suggested that variability indices based on information extracted from the acoustic signal are potentially useful in assessing dysarthric speech. Because of the ease of data collection, this method is especially applicable in the clinical setting. This study assessed the relationship between variability indices of sentence repetitions obtained by Functional Data Analysis with intelligibility ratings and maximum performance tasks in groups of speakers with hypokinetic dysarthria and ataxic dysarthria. The results showed significant correlations between selected parameters, which varied with dysarthria type. For the speakers with ataxic dysarthria, the variability measure mainly reflected differences in intelligibility, while for the group with hypokinetic dysarthria, there was a stronger relationship between variability indices and DDK performance. Lack of stronger correlations between variability measures and intelligibility ratings and maximum performance tasks are possibly due to heterogeneity of severity across and within speaker groups. This study provides further evidence that variability measures such as the FDA might be sensitive to speech performance of speakers with dysarthria, and can potentially differentiate between dysarthria types
Rhythmic variability between speakers:articulatory, prosodic and linguistic factors
Between-speaker variability of acoustically measurable speech rhythm [%V, ΔV(ln), ΔC(ln), and Δpeak(ln)] was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read seven lexically identical sentences under five different intended tempo conditions (very slow, slow, normal, fast, very fast). To study (b), 16 speakers of Zurich Swiss German produced 16 spontaneous utterances each (256 in total) for which transcripts were made and then read by all speakers (4096 sentences; 16 speaker × 256 sentences). Between-speaker variability was tested using analysis of variance with repeated measures on within-speaker factors. Results revealed strong and consistent between-speaker variability while within-speaker variability as a function of articulation rate and linguistic characteristics was typically not significant. It was concluded that between-speaker variability of acoustically measurable speech rhythm is strong and robust against various sources of within-speaker variability. Idiosyncratic articulatory movements were found to be the most plausible factor explaining between-speaker differences
Factor analysis for speaker segmentation and improved speaker diarization
Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative
- …