26,454 research outputs found

    Forensic speaker recognition

    Get PDF
    The aim of forensic speaker recognition is to establish links between individuals and criminal activities, through audio speech recordings. This field is multidisciplinary, combining predominantly phonetics, linguistics, speech signal processing, and forensic statistics. On these bases, expert-based and automatic approaches have been developed to analyze the speaker's utterances on recordings, usually originating from anonymous calls, wiretapping procedures, and covert audio surveillance. Most of the forensic laboratories still opt for either of these two approaches, even though, in many respects, they appear to be complementary. The main requirements for these methods are independence to the text, ability to handle minimal length recordings, and a superior robustness regarding noise, transmission channels, and other variations of the recording conditions. Forensic speaker recognition can be considered a forerunner in the implementation of a logical inference framework to estimate the value of the evidence from the analytical results. The limits of forensic speaker recognition are the absence of a fixed and known number of highly discriminatory features in speech, the limited quality of the audio recordings captured in forensic conditions, and the application of recognition approaches in the absence of any known underlying model that accurately represents the speaker-dependent information

    Human and Machine Speaker Recognition Based on Short Trivial Events

    Full text link
    Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, which leads to acceptable equal error rates (EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.Comment: ICASSP 201

    The correlation between auditory speech sensitivity and speaker recognition ability

    Get PDF
    In various applications of forensic phonetics the question arises as to how far aural-perceptual speaker recognition performance is reliable. Therefore, it is necessary to examine the relationship between speaker recognition results and human perception/production abilities like musicality or speech sensitivity. In this study, performance in a speaker recognition experiment and a speech sensitivity test are correlated. The results show a moderately significant positive correlation between the two tasks. Generally, performance in the speaker recognition task was better than in the speech sensitivity test. Professionals in speech and singing yielded a more homogeneous correlation than non-experts. Training in speech as well as choir-singing seems to have a positive effect on performance in speaker recognition. It may be concluded, firstly, that in cases where the reliability of voice line-up results or the credibility of a testimony have to be considered, the speech sensitivity test could be a useful indicator. Secondly, the speech sensitivity test might be integrated into the canon of possible procedures for the accreditation of forensic phoneticians. Both tests may also be used in combination

    Cross-entropy analysis of the information in forensic speaker recognition

    Full text link
    Proceedings of Odyssey 2008: The Speaker and Language Recognition Workshop, Stellenbosch, South AfricaIn this work we analyze the average information supplied by a forensic speaker recognition system in an information theoretical way. The objective is the transparent reporting of the performance of the system in terms of information, according to the needs of transparency and testability in forensic science. This analysis allows the derivation of a proper measure of goodness for forensic speaker recognition, the empirical cross-entropy (ECE), according to previous work in the literature. We also propose an intuitive representation, namely the ECE plot, which allows forensic scientists to explain the average information given by the evidence analysis process in a clear and intuitive way. Such representation allows the forensic scientist to assess the evidence evaluation process with independence of the prior information, which is province of the court. Then, fact finders may check the average information given by the evidence analysis with the incorporation of prior information. An experimental example following NIST SRE 2006 protocol is presented in order to highlight the adequacy of the proposed framework in the forensic inferential process. An example of the presentation of the average information supplied by the forensic analysis of the speech evidence in court is also provided, simulating a real case.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01

    Forensic and Automatic Speaker Recognition System

    Get PDF
    Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metric

    Automatic-type calibration of traditionally derived likelihood ratios:Forensic analysis of Australian English /o/ formant trajectories

    Get PDF
    A traditional-style phonetic-acoustic forensic-speaker-recognition analysis was conducted on Australian English /o/ recordings. Different parametric curves were fitted to the formant trajectories of the vowel tokens, and cross-validated likelihood ratios were calculated using a single-stage generative multivariate kernel density formula. The outputs of different systems were compared using C llr, a metric developed for automatic speaker recognition, and the cross-validated likelihood ratios were calibrated using a procedure developed for automatic speaker recognition. Calibration ameliorated some likelihood-ratio results which had offered strong support for a contrary-to-fact hypothesis

    Application of an Annular/Sphere Search Algorithm for Speaker Recognition

    Get PDF
    In this work, an alternative search algorithm for vector quantization codebook is applied as a way to improve the performance of an automatic speaker recognition system. The search algorithm is based on geometrical properties of the vector space, defining annular and spherical regions instead of a full search method. The speaker recognition system is intended to identify a suspect, between a small group of persons, using low quality recordings, working as a text independent automatic speaker recognition system. Because the rate of recognition required in forensic applications is extremely important, the use of good discrimination algorithms can reduce the risk of bad decisions. The performance of the system under such a conditions is reported. Besides the few speaker samples used for training, a high recognition rate was obtained, so it was found an improvement of the recognition rate over the full search method

    A reduced search algorithm for speaker recognition

    Get PDF
    In this work, a reduced search algorithm for vector quantization codebooks is applied as a way to reduce the risk of wrong decisions in an automatic speaker recognition system. Instead of a full search method, the algorithm is based on the geometrical properties of the vector space, reducing the search to those codebooks which are closer to the vector under test. The speaker recognition system is intended to identify a suspect, between a small group of persons, using low quality recordings, working as a text independent automatic speaker recognition system. It was found that the alternative search algorithm can be used to reduce the risk of wrong decisions, which is specially important in forensic applications
    corecore