Skip to main content
Article thumbnail
Location of Repository

Analysis of Voiceprint and Other Biometrics for\ud Criminological and Security Applications

By Farbod Hosseyndoust Foomany


This Thesis examines the role and limitations of voice biometrics in the contexts of security and\ud for crime reduction. The main thrust of the Thesis is that despite the technical and non-technical hurdles that this research has identified and sought to overcome, voice can be an effective and sustainable biometric if used in the manner proposed here. It is contended that focused and continuous evaluation of the strength of systems within a solid framework is essential to the development and application of voice biometrics and that special attention needs to be paid to human dimensions in system design and prior to deployment.\ud \ud \ud Through an interdisciplinary approach towards the theme reflected in the title several scenarios\ud are presented of the use of voice in security / crime reduction, crime investigation, forensics and surveillance contexts together with issues surrounding their development and implementation.\ud With a greater emphasis on security-oriented voice verification (due to the diversity of the usage scenarios and prospect of use) a new framework is presented for analysis of the reliability and security of voice verification.\ud \ud \ud This research calls not only for a standard evaluation scheme and analytical framework but also takes active steps to evaluate the prototype system within the framework under various conditions. Spoof attacks, noises, coding, distance and channel effects are among the factors that are studied. Moreover, an additional under-researched area, the detection of counterfeit signals, is also explored.\ud \ud \ud While numerous technical and design contributions made in this project are summarised in chapter 2, the research mainly aims to provide solid answers to the high-level strategic questions. The Thesis culminates in a synthesis chapter in which realistic expectations, design requirements and technical limitations of the use of voice for criminological and security applications are\ud outlined and areas for further research are defined

Topics: T1
OAI identifier:

Suggested articles


  1. (2005). 50 years of progress in speech and speaker recognition’,
  2. (2007). A case for formant analysis in forensic speaker identification',
  3. (1993). A comparative analysis of three electronically monitored home detention programs',
  4. (1998). A comparative study of speaker verification systems using the Polycast database’,
  5. (1974). A dendrite method for cluster analysis',
  6. (2009). A dictionary of law,
  7. (2004). A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel’,
  8. (1985). A four parameter model of glottal flow',
  9. (2005). A Hybrid Approach to Speaker Recognition in Multi-speaker Environment', Lecture notes in computer science
  10. (2003). A MATLAB demonstration of Independent Component Analysis', Undergraduate project dissertation.
  11. (2003). A maximum likelihood approach to single-channel source separation',
  12. (2000). A model-based transformational approach to robust speaker recognition’,
  13. (1960). A new approach to linear filtering and prediction problems',
  14. (1998). A perceptual evaluation of distance measures for concatenative speech synthesis’, in
  15. (2005). A principled approach to score level fusion in multimodal biometric systems’, Audio-and Video-Based Biometric Person Authentication,
  16. (1996). A report on a voice disguise experiment',
  17. (1937). A scale for the measurement of the psychological magnitude pitch',
  18. (1995). A Speech Production Model including the Nasal Cavity',
  19. (2006). A Study of Intentional Voice Modifications for Evading Automatic Speaker Recognition’,
  20. (1989). A tutorial on hidden Markov models and selected applications inspeech recognition',
  21. (1998). A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques’,
  22. (1993). A voice activity detector based on cepstral analysis’,
  23. (1973). Acoustic description and classification of phonetic units. Reprinted in Speech Sounds and Features',
  24. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates',
  25. (1994). Adaptive source separation with uniform
  26. (2005). Additive background noise as a source of nonlinear mismatch in the cepstral and log-energy domain',
  27. (2006). Addressing channel mismatch through speaker discriminative transforms’,
  28. (2002). Alternative Biometrics', Biometric Technology Today.
  29. (1999). American Board of Recorded Evidence
  30. (1992). An adaptive algorithm for mel-cepstral analysis of speech',
  31. (1975). An algorithm for determining the endpoints of isolated utterances',
  32. (1985). An examination of procedures for determining the number of clusters in a data set',
  33. (1999). An experimental study of speaker verification sensitivity to computer voice-altered impostors’,
  34. (2006). An identity management protocol for Internet applications over 3G mobile networks',
  35. (2007). An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification', Speech Communication. doi
  36. (2002). An overview of automatic speaker recognition technology’,
  37. (2007). Annual Report on the State of Biometric Standards', Last Retrieved 2010-03-21, f> .
  38. (2001). Army Biometric Applications: Identifying and Addressing Sociocultural Concerns, Rand Corporation.
  39. (2004). Audio-visual automatic speech recognition: An overview',
  40. (2005). Audio/visual person identification’, 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms.
  41. (1990). Auditory scene analysis,
  42. (2000). Aural-perceptual speaker identification: problems with noncontemporary samples',
  43. (1996). Automatic audio morphing’,
  44. (2008). Automatic Forensic Dental Identification',
  45. (1998). Automatic Personal Identification Using Fingerprints',
  46. (2008). Automatic speech recognition on mobile devices and over communication networks,
  47. (2002). Automatic speechreading with application to speaker verification’,
  48. (2001). Beware of the telephone effect”: The influence of telephone transmission on the measurement of formant frequencies',
  49. (2000). Biometric decision landscapes',
  50. (2002). Biometric Evaluation Methodology Working Group
  51. (2006). Biometric Information Technology Ethics (BITE)
  52. (2005). Biometric liveness detection',
  53. (2001). Biometric product testing final report, issue 1.0',
  54. (2006). Biometric standards - An update',
  55. (2008). Biometric System Security', doi
  56. (2005). Biometrics &
  57. (2001). Biometrics and privacy',
  58. (2005). Biometrics at the frontiers: Assessing the impact on society',
  59. (2008). Biometrics in the Government Sector',
  60. (1999). Blind signal separation and speech recognition in the frequency domain’,
  61. (1998). Blind signal separation: statistical principles',
  62. (2000). Blind speech separation of moving speakers in real reverberant environments’,
  63. (2007). Burglars and wardrobe monsters. Practical and ethical problems in the reduction of crime fear',
  64. (2000). Can automatic speaker verification be improved by training the algorithms on emotional speech?’,
  65. (1999). Can we make crime prevention adaptive by learning from other evolutionary struggles?',
  66. (2006). Casa based speech separation for robust speech recognition’,
  67. (1992). Changing Speech Styles: Strategies in Read Speech and Casual and Careful Spontaneous Speech.',
  68. (2003). Channel robust speaker verification via feature mapping’,
  69. (2000). Charter of Fundamental Rights of the European Union',
  70. (2009). Classifying clear and conversational speech based on acoustic features', Interspeech doi
  71. (2006). Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end',
  72. (2006). Cluster Stopping Rules for Word Sense Discrimination', Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together,
  73. (1975). Clustering algorithms,
  74. (2004). Combining multiple biometrics',
  75. (2009). Common Criteria for Information Technology Security Evaluation: Part 1: Introduction and general model’, Version 3.1, Last Retrieved
  76. (2007). Compression of surface EMG signals with algebraic code excited linear prediction',
  77. (2008). Computing the Moore--Penrose inverse of a matrix with a Computer Algebra System',
  78. (2006). Confidence and reliability measures in speaker verification',
  79. (2003). Consumer Concern and Privacy: A Transition from Pre-Web to Post-Web’, ,
  80. (2001). Continuous latent variable models for dimensionality reduction and sequential data reconstruction',
  81. (1950). Convention for the Protection of Human Rights and Fundamental Freedoms',
  82. (2004). Crime, fear of crime and quality of life identifying and responding to problems',
  83. (1995). Design patterns: elements of reusable objectoriented software, Addison-wesley
  84. (2007). Detection and Recognition of voice disguise',
  85. (2001). Digital speech processing, synthesis, and recognition,
  86. (2005). Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis’,
  87. (2007). Discrete cosine and sine transforms: general properties, fast algorithms and integer approximations,
  88. (1989). Discrete-time signal processing,
  89. (2007). Dynamic features of speech and the characterization of speakers: Toward a new approach using formant frequencies',
  90. (2009). EC competition and telecommunications law,
  91. (2005). ECG to identify individuals',
  92. (1996). Effect of speaking style on LVCSR performance’,
  93. (2008). Effects of Vocal Effort and Speaking Style on Text-Independent Speaker Verification',
  94. (1993). Efficient cepstral normalization for robust speech recognition’,
  95. (2005). Electronic monitoring in England and Wales: evidence-based or not?',
  96. (2002). Electronic tagging of offenders: trials in England',
  97. (2006). Evaluation and Modification of Cepstral Moment Normalization for Speech Recognition in Additibe Babble Ensemble’,
  98. (1972). Experiment on voice identification',
  99. (1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series,
  100. (1999). Facing Severe Channel Variability in Forensic Speaker Verification Conditions’,
  101. (1997). Factorial hidden Markov models',
  102. (2007). Far-field speaker recognition',
  103. (1987). Fast CELP coding based on algebraic codes’,
  104. (2003). Feature and score normalization for speaker verification of cellular data’,
  105. (2004). Fingerprint authentication: The user experience', presented at the
  106. (2001). Forensic phonetics and sociolinguistics',
  107. (2008). Forensic Procedures for Boundary and Title Investigation,
  108. (2002). Forensic speaker identification,
  109. (2008). Forensic speaker recognition using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English ai', Manuscript submitted for publication.
  110. (2008). Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models',
  111. (2001). Forensic speech and audio analysis forensic linguistics’,
  112. (2006). Fraud: The Facts
  113. (2009). From H1.3.5: Requirements and concepts for identity management throughout life',
  114. (2000). Fundamentals of Acoustics,
  115. (2008). Handbook of biometrics,
  116. (2003). Handbook of Fingerprint Recognition.
  117. (1968). Handbook of tables for probability and statistics,
  118. (2004). Handling the Voiceprint Issue',
  119. (1996). Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification', These,
  120. (2008). Heart sound as a biometric',
  121. (1998). HMM-based smoothing for concatenative speech synthesis’,
  122. (2002). HMM-Based Speech Synthesis and Its Applications.
  123. (2003). Home Office Circular
  124. (2008). How vulnerable are prosodic features to professional imitators?’,
  125. (1994). Human identification in information systems: Management challenges and public policy issues',
  126. (1994). Identifying genuine clusters in a classification',
  127. (2004). Identity fraud: A critical national and global threat',
  128. (2004). Identity fraud: the stealth threat to UK plc',
  129. (2007). Identity management and data protection law: Risk, responsibility and compliance in Circles of Trust-Part II',
  130. (2001). Identity management and its support of multilateral security',
  131. (2002). Impact of artificial gummy fingers on fingerprint systems'(1)'Proceedings of SPIE',
  132. (2000). Impact of speaking style and speaking task on acoustic models’,
  133. (2006). Impact of the GSM AMR Speech Codec on Formant Information Important to Forensic Speaker Identification’,
  134. (2000). Imposture using synthetic speech against speaker verification based on spectrum and pitch’,
  135. (2004). Improvement of recognition of simultaneous speech signals using av integration and scattering theory for humanoid robots',
  136. (2001). Improving simultaneous speech recognition in real room environments using overdetermined blind source separation’,
  137. (2003). Improving speaker identification in noise by subband processing and decision fusion',
  138. (2003). Improving the filter bank of a classic speech feature extraction algorithm’,
  139. (2006). Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing',
  140. (2007). Incorporating auditory feature uncertainties in robust speaker identification’,
  141. (1996). Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays’,
  142. (2000). Independent component analysis: a tutorial',
  143. (2003). Individual use of acoustic parameters in read and spontaneous speech’,
  144. (2008). Information Commissioner’s Office (ICO)
  145. (2003). Information fusion in biometrics',
  146. (1994). Information Technology and the Family',
  147. (2000). Integrated Circuit Card, Specification for Payment Systems,
  148. (2000). Integrated Circuit Card, Specification for Payment Systems, Book 3, Application Specification, EMVCo. European_Group_on_Ethics_in_Science_and_New_Technologies (EGE)
  149. (2005). Integration of multiple cues in biometric systems',
  150. (2005). International guide to privacy',
  151. (2007). Introduction to biometrics from a legal perspective',
  152. (2008). Introduction to Biometrics',
  153. (2006). Introduction to Dataveillance and Information Privacy,
  154. (2008). Introduction to modern cryptography,
  155. (2008). Introduction to Multibiometrics',
  156. (1994). Introduction to statistics: concepts and applications, doi
  157. (2006). Iris Pattern Matching using Score Normalisation Techniques',
  158. (2001). Is speech data clustered?-statistical analysis of cepstral features’,
  159. (2005). JMATLINK 1.3', Last Retrieved
  160. (2007). Joint iterative multi-speaker identification and source separation using expectation propagation’,
  161. (1973). Letter: Reply to" speaker identification by speech spectrograms: some further observations".',
  162. (1980). Long-term auditory memory: Speaker identification.',
  163. (1998). Markov chains and hidden Markov models', Biological sequence analysis: probabilistic models of proteins and nucleic acids,
  164. (2000). Mode-finding for mixtures of Gaussian distributions',
  165. (2003). Modeling prosodic dynamics for speaker recognition’,
  166. (1995). Modern methods of speech processing,
  167. (2004). Multiband Approach to Robust Text-Independent Speaker Identification',
  168. (2007). Network security essentials: applications and standards,
  169. (2008). On best approximate solutions of linear matrix equations'(01)'Mathematical Proceedings of the Cambridge Philosophical Society',
  170. (2004). On compensation of mismatched recording conditions in the bayesian approach for forensic automatic speaker recognition', Forensic science international 146,
  171. (2005). On factors affecting MFCC-based speaker recognition accuracy’,
  172. (1951). On information and sufficiency',
  173. (1992). On separating amplitude from frequency modulations using energy operators’,
  174. (2004). On the deployment of speaker recognition for commercial applications’,
  175. (1994). On the limitations of cepstral features in noise’,
  176. (1994). On the problem of speaker identification by victims and witnesses',
  177. (1920). On the reciprocal of the general algebraic matrix',
  178. (1998). On the reduction of concatenation artefacts in diphone synthesis’,
  179. (1999). On the security of HMM-based speaker verification systems against imposture using synthetic speech’,
  180. (1999). On the use of automatic speaker verification systems in forensic casework’, Audio-and Video-based Biometric Person Authentication',
  181. (1997). Optimization of the asymptotic performance of time-domain convolutive source separation algorithms’,
  182. (1997). others
  183. (2001). Perceptual and objective detection of discontinuities in concatenative speech synthesis’,
  184. (2003). Person authentication by voice: a need for caution’,
  185. (2004). Phonetics,
  186. (2002). Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio',
  187. (2004). Predicting Performance of Fused Biometric Systems',
  188. (2008). Privacy and forensics investigation process: The ERPINA protocol', doi
  189. (2001). Privacy Law :: Biometrics and privacy',
  190. (1965). Probability and Statistics for Engineers',
  191. Project, 'Liberty Glossary v2.0’, Last Retrieved
  192. (2007). Public awareness and perceptions of biometrics',
  193. (2006). Quality-based score level fusion in multibiometric systems’,
  194. (2005). Recognizing speech from simultaneous speakers’,
  195. (1890). Right to Privacy',
  196. (2007). Risk, responsibility and compliance in Circles of Trust-Part I',
  197. (2003). Robot recognizes three simultaneous speech by active audition’,
  198. (2006). Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition',
  199. (1996). Robust prosodic features for speaker identification’,
  200. (2008). Robust speaker identification using auditory features and computational auditory scene analysis’,
  201. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models',
  202. (2000). Robustness to telephone handset distortion in speaker recognition by discriminative feature design',
  203. (2000). Score normalization for text-independent speaker verification systems',
  204. (2006). Secure biometric systems',
  205. (2005). Separation of speech by computational auditory scene analysis', Speech enhancement,
  206. (2007). Simplifying Web Services Development with the Decorator Pattern’,
  207. (2004). Soft biometric traits for personal recognition systems’,
  208. (2001). Sound morphing with Gaussian mixture models’,
  209. (1994). Sources of degradation of speech recognition in the telephonenetwork’,
  210. (1968). Speaker authentication and identification: A comparison of spectrographic and auditory presentations of speech material',
  211. (1996). Speaker identification by lipreading’, Spoken Language,
  212. (1970). Speaker identification by speech spectrograms: A scientists’ view of its reliability for legal purposes',
  213. (1973). Speaker identification by speech spectrograms: Some further observations',
  214. (2006). Speaker recognition using channel factors feature compensation',
  215. (2000). Speaker verification over the telephone* 1',
  216. (2006). Speaker verification security improvement by means of speech watermarking',
  217. (2000). Speaker verification using adapted Gaussian mixture models', Digital signal processing 10(1-3),
  218. (2000). Speaker verification with elicited speaking styles in the VeriVox project',
  219. (2004). Spectral features for automatic text-independent speaker recognition',
  220. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition,
  221. (2009). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall, Second Edition.
  222. (2000). Speech reconstruction from mel frequency cepstral coefficients and pitch frequency’,
  223. (2006). Speech science primer: Physiology, acoustics, and perception of speech,
  224. (2001). Speech synthesis and recognition,
  225. (2006). Spoof 2007—High-Level Test Plan.',
  226. (2008). Spoof Detection Schemes',
  227. (2002). Spoofing and anti-spoofing measures',
  228. (2003). Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition’,
  229. (1999). Statistical pattern recognition, A Hodder Arnold Publication.
  230. (2005). Statistical Tests for Voice Activity Detection’,
  231. (2006). Studies of Biometric Fusion
  232. (2006). Studies of Biometric Fusion Appendix C Evaluation of Selected Biometric Fusion Techniques', Last Retrieved
  233. (2006). TC-STAR: Specifications of language resources and evaluation for speech synthesis’,
  234. (2006). Technical forensic speaker recognition: Evaluation, types and testing of evidence',
  235. (2005). Testing Voice Mimicry with the YOHO Speaker Verification Corpus’, Knowledge-Based Intelligent Information and Engineering Systems',
  236. (2006). Text-independent voice conversion based on unit selection’,
  237. (1998). The ability of expert witnesses to identify voices: a comparison between trained and untrained listeners',
  238. (1999). The adaptive multi-rate speech coder’,
  239. (2002). The American Public and Biometrics, presented at a conference organized by the National Consortium of Justice and Information Statistics,
  240. (2004). The Bayes Net Toolbox for Matlab', Last Retrieved
  241. (2006). The CHAINS corpus: Characterizing individual speakers’,
  242. (1999). The Concepts of Hidden Markov Model in Speech Recognition',
  243. (1997). The DET curve in assessment of detection task performance’,
  244. (2004). The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications', doi
  245. (1981). The effects of delay on voice recognition accuracy',
  246. (2009). The identification of the individual through speech', Language and Identities,
  247. (2000). The inference of identity in forensic speaker recognition',
  248. (2001). The IViE corpus',
  249. (2008). The Law and the Use of Biometrics',
  250. (2005). The Laws of Identity’,
  251. (2003). The Need for Standardization of Multi-Modal Biometric Combination', Algorithmica Limited,
  252. (2000). The NIST
  253. (1997). The right to privacy, Vintage Books.
  254. (1975). The right to privacy',
  255. (2004). The Two Western Cultures of Privacy: Dignity versus Liberty.',
  256. (2000). The value of identity and the need for authenticity', Foresight Crime Prevention Panel Essay, Turning the Corner.
  257. (2007). The'mobile phone effect'on vowel formants',
  258. (2007). Transforming binary uncertainties for robust speech recognition',
  259. (2005). Tutorial on forensic speech science’,
  260. (2005). Two barriers to realizing the benefits of biometrics-A chain perspective on biometrics, and identity fraud',
  261. (2009). UK Photographers Rights Guide V.2',
  262. (2007). Under attack: Common Criteria has loads of critics, but is it getting a bum rap', Government Computer News.
  263. (2006). UNISYS STUDY: Consumers Worldwide Overwhelmingly Support Biometrics for Identity Verification', Press Release,
  264. (2006). Updated Estimate of the Cost of Identity Fraud to the UK Economy’, Identity Fraud Steering Committee, Last Retrieved:
  265. (2004). Usability and acceptability of biometric security systems',
  266. (2005). User Authentication using On-Line Signature and Speech', Master's thesis,
  267. (2007). Veiled Truth: Can the Credibility of Testimony Given by a Niqab-Wearing Witness be Judged without the Assistance of Facial Expressions?',
  268. (2007). Voice activity detection using MFCC features and support vector machine’,
  269. (2006). Voice conversion based on mapping formants’, TC-STAR Workshop on Speech-to-Speech Translation.
  270. (2003). Voice morphing using the generative topographic mapping',
  271. (1971). Voice spectrograms as a function of age, voice disguise, and voice imitation',
  272. (1944). Voiceprint identification.',
  273. (1962). Voiceprint identification',
  274. (2000). Voicing Source’,
  275. (1999). Vulnerability in speaker verification-a study of technical impostor techniques’,
  276. (2004). Vulnerability of speaker verification to voice mimicking’, Intelligent Multimedia, Video and Speech Processing,
  277. (2004). Wavelet-based voice morphing',
  278. (1998). Which is More Important in a Concatenative Text to Speech System-Pitch, Duration, or Spectral Discontinuity?’, The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.