Search CORE

2 research outputs found

The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

Author: Nash John
Publication venue: University of York
Publication date: 01/09/2019
Field of study

This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

White Rose E-theses Online

Digital Foresnic Analysis for Compressed Images and Videos.

Author: Qadir Ghulam.
Publication venue: Guildford
Publication date: 06/05/2020
Field of study

The advancement of imaging devices and image manipulation software has made the tasks of tracking and protecting of digital multimedia content becoming increasingly difficult. In order to protect and verify the integrity of the digital content, many active watermarking and passive forensic techniques have been developed for various image and video formats in the past decade or so. In this thesis, we focus on the research and development of digital image forensic techniques, particularly for the processing history recovery of JPEG2000 (J2K) images. J2K is a new and improved format introduced by the Joint Photographic Experts Group (JPEG). Unlike JPEG, it is based on the Discrete Wavelet Transform (DWT) and has a more complex coding system. However, the size-to-compression ratio of J2K is significantly better than JPEG and can be used for storing CCTV data and also for digital cinema applications. In this thesis, the novel use of the Benfords Law for the analysis of J2K compressed images is investigated. The Benfords law is essentially a statistical law that has previously been used for the detection of financial and accounting frauds. Initial results obtained after testing 1,338 grayscale images show that the first digit probability distribution of the DWT follows the Benfords Law. However, when images are compressed with J2K compression, the first digit probability graph starts to deviate from the actual distribution of the Benfords Law curve. The compression can also be detected via the divergence factor derived from the graph. Furthermore, the use of Benfords law can be applied for the analysis of an image feature known as glare, by investigating the anomaly in the first digit probability curve of DWT coefficients. The results show that out of 1,338 images, 122 images exhibit the irregular peak at digit 5 with each of these images possesses glare. This can potentially be used as a tool to isolate images containing glare for large-scale image databases. This thesis also presents a novel J2K compression strength detection technique. The compression strength is classified into three categories which correspond to high, medium and low subjective image quality representing compression strength low, medium and high compression strengths, respectively, ranging from 0 to 1 bits per pixel (bpp). The proposed technique employs a no-reference (NR) perceptual blur metric and double compression calibration to identify some heuristic rules that are then used to design an unsupervised classifier for determining the J2K compression strength of a given image. In our experiments we experiment on 100 images to identify the heuristic rules, followed by another set of 100 different images for testing the performance of our method. The results show that the compression strength achieves an accuracy of approximately 90%. The thesis also presents a new benchmarking tool for video forensics known as Surrey University Library for Forensics Analysis (SULFA). The library is considered to be the first of its kind available to the research community and contains 150 untouched original videos obtained from three digital cameras of different makes and models, as well as a number of tampered videos and supporting ground-truth datasets that can be used for video forensic experiments and analysis

University of Surrey