Search CORE

4 research outputs found

Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing

Author: Foulkes Paul
French John Peter
Harrison Philip Thomas
Hughes Vincent
Kavanagh Colleen
San Segundo Fernandez Eugenia
Publication venue
Publication date
Field of study

In forensic voice comparison, there is increasing focus on the integration of automatic and phonetic methods to improve the validity and reliability of voice evidence to the courts. In line with this, we present a comparison of long-term measures of the speech signal to assess the extent to which they capture complementary speaker-specific information. Likelihood ratio-based testing was conducted using MFCCs and (linear and Mel-weighted) long-term formant distributions (LTFDs). Fusing automatic and semi-automatic systems yielded limited improvement in performance over the baseline MFCC system, indicating that these measures capture essentially the same speaker-specific information. The output from the best performing system was used to evaluate the contribution of auditory-based analysis of supralaryngeal (filter) and laryngeal (source) voice quality in system testing. Results suggest that the problematic speakers for the (semi-)automatic system are, to some extent, predictable from their supralaryngeal voice quality profiles, with the least distinctive speakers producing the weakest evidence and most misclassifications. However, the misclassified pairs were still easily differentiated via auditory analysis. Laryngeal voice quality may thus be useful in resolving problematic pairs for (semi-)automatic systems, potentially improving their overall performance

White Rose Research Online

Speaker identification in courtroom contexts - Part I: Individual listeners compared to forensic voice comparison based on automatic-speaker-recognition technology

Author: Basu Nabanita
Bali Agnes S
Weber Philip
Rosas-Aguilar Claudia
Edmond Gary
Martire Kristy A
Morrison Geoffrey Stewart
Publication venue: 'Elsevier BV'
Publication date: 15/10/2022
Field of study

Expert testimony is only admissible in common law if it will potentially assist the trier of fact to make a decision that they would not be able to make unaided. The present paper addresses the question of whether speaker identification by an individual lay listener (such as a judge) would be more or less accurate than the output of a forensic-voice-comparison system that is based on state-of-the-art automatic-speaker-recognition technology. Listeners listen to and make probabilistic judgements on pairs of recordings reflecting the conditions of the questioned- and known-speaker recordings in an actual case. Reflecting different courtroom contexts, listeners with different language backgrounds are tested: Some are familiar with the language and accent spoken, some are familiar with the language but less familiar with the accent, and others are less familiar with the language. Also reflecting different courtroom contexts: In one condition listeners make judgements based only on listening, and in another condition listeners make judgements based on both listening to the recordings and considering the likelihood-ratio values output by the forensic-voice-comparison system. [Abstract copyright: Copyright © 2022 The Author(s). Published by Elsevier B.V. All rights reserved.

Aston Publications Explorer

Individual Differences in Speech Production and Perception

Author
Publication venue: 'Peter Lang, International Academic Publishers'
Publication date
Field of study

Inter-individual variation in speech is a topic of increasing interest both in human sciences and speech technology. It can yield important insights into biological, cognitive, communicative, and social aspects of language. Written by specialists in psycholinguistics, phonetics, speech development, speech perception and speech technology, this volume presents experimental and modeling studies that provide the reader with a deep understanding of interspeaker variability and its role in speech processing, speech development, and interspeaker interactions. It discusses how theoretical models take into account individual behavior, explains why interspeaker variability enriches speech communication, and summarizes the limitations of the use of speaker information in forensics

OAPEN Library

The definition of the relevant population and the collection of data for likelihood ratio-based forensic voice comparison

Author: Hughes Vincent
Publication venue
Publication date: 01/01/2014
Field of study

Within the field of forensic speech science there is increasing acceptance of the likelihood ratio (LR) as the logically and legally correct framework for evaluating forensic voice comparison (FVC) evidence. However, only a small proportion of experts cur- rently use the numerical LR in casework. This is due primarily to the difficulties involved in accounting for the inherent, and arguably unique, complexity of speech in a fully data-driven, numerical LR analysis. This thesis addresses two such issues: the definition of the relevant population and the amount of data required for system testing. Firstly, experiments are presented which explore the extent to which LRs are affected by different definitions of the relevant population with regard to sources of systematic sociolinguistic between-speaker variation (regional background, socio-economic class and age) using both linguistic-phonetic and ASR variables. Results show that different definitions of the relevant population can have a substantial effect on the magnitude of LRs, depending on the input variable. However, system validity results suggest that narrow controls over sociolinguistic sources of variation should be preferred to general controls. Secondly, experiments are presented which evaluate the effects of development, test and reference sample size on LRs. Consistent with general principles in statistics, more precise results are found using more data across all experiments. There is also considerable evidence of a relationship between sample size sensitivity and the dimensionality and speaker discriminatory power of the input variable. Further, there are potential trade-offs in the size of each set depending on which element of LR output the analyst is interested in. The results in this thesis will contribute towards im- proving the extent to which LR methods account for the linguistic-phonetic complexity of speech evidence. In accounting for this complexity, this work will also increase the practical viability of applying the numerical LR to FVC casework

White Rose E-theses Online

White Rose Research Online