737 research outputs found

    Speaker Recognition: Advancements and Challenges

    Get PDF

    The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

    Get PDF
    This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

    Joint factor analysis for forensic automatic speaker recognition

    Get PDF
    Projecte final de carrera fet en col·laboració amb Faculté Sciences et Techniques de l'Ingéenieur. Institut de Traitement des SignauxEnglish: Nowadays, under controlled recording conditions, the state-of-the-art automatic speaker recognition systems show very good performance in discriminating between voices of speakers. However, in investigative activities (e.g., anonymous calls and wire-tapping) the conditions in which recordings are made cannot be controlled and pose a challenge to automatic speaker recognition. Some factors that introduce variability in the recordings can be the differences in the phone handset, in the transmission channel and in the recording devices. The strength of evidence, estimated using statistical models of within-source variability and between-sources variability, is expressed as a likelihood ratio, i.e., the probability of observing the features of the questioned recording in the statistical model of the suspected speaker's voice given the two competing hypotheses: the suspected speaker is the source of the questioned recording and the speaker at the origin of the questioned recording is not the suspected speaker. The main unresolved problem in forensic automatic speaker recognition today is that of handling mismatch in recording conditions. This mismatch has to be considered in the estimation of the likelihood ratio because it can introduce important errors. In this work, we handle and analyze this state-of-the-art system. The forensic automatic speaker recognition system consists of many parts, such as feature extraction and modeling. We have focused on the modeling part, training models which can be decomposed in two spaces, the speaker and session subspace. This technique, called Joint Factor Analysis, is the state-of-the-art in the speaker verification systems. Using the property of decomposition in two subspaces, we try to solve the problem of mismatched conditions adapting the session subspace of the train recordings to a new session subspace (which is under different conditions). To estimate the speaker and session subspaces, we need some databases, e.g. one database containing the traces, and another containing recordings from the suspect. These databases must be recorded in several conditions to simulate a real forensic case where mismatched is present. Examples to such recording conditions are cellular phones or fixed telephone network. Finally, an evaluation of the system is presented at the end of the work. Thanks to this evaluation, we see which recording conditions degrade more the results, what effect the mismatch have on the results and, how much the adaptation can fix these effects.Castellano: Hoy en día, bajo condiciones controladas, los sistemas de reconocimiento de locutor obtienen unos resultados muy buenos al discernir entre las voces de los hablantes. Sin embargo, en las actividades de investigación (por ejemplo, las llamadas anónimas y escuchas telefónicas) las condiciones en que las grabaciones se realizan no pueden ser controladas y representan un desafío para el reconocimiento automático de locutor. Algunos de los factores que introducen variabilidad en las grabaciones pueden ser las diferencias en el terminal telefónico, en el canal de transmisión y los dispositivos de grabación. La fuerza de la prueba, estimada utilizando modelos estadísticos de variabilidad entre locutores y variabilidad entre el mismo locutor, se expresa como un ratio de verosimilitud, es decir, la probabilidad de observar las características de la grabación cuestionada en el modelo estadístico de la voz del sospechoso dada dos hipótesis: el sospechoso es la fuente de la grabación cuestionada y el locutor en el origen de la grabación cuestionada no es el sospechoso. El principal problema sin resolver en el reconocimiento automático de locutor para las ciencias forenses es tratar con el desajuste en las condiciones de grabación. Este desajuste se debe considerar en la estimación del ratio de verosimilitud, ya que puede introducir errores importantes. En este trabajo, usamos y analizamos estos sistemas. El sistema de reconocimiento automático de locutor para las ciencias forenses se compone de muchas partes, tales como la extracción de características y el modelado. Nosotros nos hemos centrado en la parte de modelado, entrenando modelos que se puede descomponer en dos espacios, el subespacio del locutor y el de sesión. Esta técnica, llamada Análisis Factorial Conjunto (Joint Factor Analysis), es el estado del arte en los sistemas de verificación de locutor. Usando la propiedad de descomposición en dos subespacios, tratamos de resolver el problema de desajuste de condiciones adaptando el subespacio de sesión de las grabaciones de entrenamiento a un nuevo subespacio de sesión (que se encuentra bajo otras condiciones). Para la estimación de los subespacios de locutor y de sesión, necesitamos algunas bases de datos, por ejemplo, una base de datos que contenga las pruebas, y otra que contenga las grabaciones del sospechoso. Estas bases de datos deben ser grabadas bajo diferentes condiciones para simular un caso forense real donde el desajuste de condiciones está presente. Ejemplos de condiciones de grabación son los teléfonos móviles o la red fija de telefonía. Finalmente, una evaluación del sistema se presenta al final del proyecto. Gracias a esta evaluación, vemos qué condiciones de grabación degradan más los resultados, qué efecto tiene el desajuste de condiciones en los resultados y, cómo la adaptación puede arreglar estos efectos.Català: Avui en dia, sota condicions controlades, els sistemes de reconeixement de locutor obtenen uns resultats molt bons al discernir entre les veus dels parlants. No obstant això, en les activitats d'investigació (per exemple, les trucades anònimes i escoltes telefòniques) les condicions en què les gravacions es realitzen no poden ser controlades i representen un desafiament per al reconeixement automàtic de locutor. Alguns dels factors que introdueixen variabilitat en els enregistraments poden ser les diferències en el terminal telefònic, al canal de transmissió i els dispositius de gravació. La força de la prova, estimada utilitzant models estadístics de variabilitat entre locutors i variabilitat entre el mateix locutor, s'expressa com una ràtio de versemblança, és a dir, la probabilitat d'observar les característiques de la gravació qüestionada en el model estadístic de la veu del sospitós donada dues hipòtesis: el sospitós és la font de la gravació qüestionada i el locutor de la gravació qüestionada no és el sospitós. El principal problema sense resoldre en el reconeixement automàtic de locutor per a les ciències forenses és tractar amb el desajust en les condicions de gravació. Aquest desajust s'ha de considerar en l'estimació de la ràtio de versemblança, ja que pot introduir errors importants. En aquest treball, utilitzem i analitzem aquests sistemes. El sistema de reconeixement automàtic de locutor per a les ciències forenses es compon de moltes parts, com ara l'extracció de característiques i el modelatge. Nosaltres ens hem centrat en la part de modelatge, entrenant models que es poden descompondre en dos espais, el subespai del locutor i el de sessió. Aquesta tècnica, anomenada Anàlisi Factorial Conjunt (Joint Factor Analysis), és l'estat de l'art en els sistemes de verificació de locutor. Fent servir la propietat de descomposició en dos subespais, tractem de resoldre el problema de desajustament de condicions adaptant el subespai de sessió de les gravacions d'entrenament a un nou subespai de sessió (que es troba sota altres condicions). Per a l'estimació dels subespais de locutor i de sessió, necessitem algunes bases de dades, per exemple, una base de dades que contingui les proves, i una altra que contingui les gravacions del sospitós. Aquestes bases de dades han de ser gravades sota diferents condicions per simular un cas forense real on el desajust de condicions hi és present. Exemples de condicions de gravació són els telèfons mòbils o la xarxa fixa de telefonia. Finalment, una avaluació del sistema es presenta al final del projecte. Gràcies a aquesta avaluació, veiem quines condicions de gravació degraden més els resultats, quin efecte té el desajust de condicions en els resultats i, com l'adaptació pot arreglar aquests efectes

    Applying linguistics: questions of language and law.

    Get PDF
    This chapter, first published in Japanese, outlines the field of ‘forensic linguistics’ [as at 2001]. It contrasts the forensic linguistic approach to applying linguistics in legal contexts with a different tradition of analysis: that usually known as Critical Discourse Analysis and/or as Critical Legal Studies. The author examines ‘meaning’ issues in particular, as a way of showing how treatment of specific interpretive questions exposes problematic assumptions underpinning the notion of linguistic expertise. The chapter concludes with a suggestion that, in a period of rapidly changing communication technologies and formats, notions of professional authority in respect of language and meaning may need to be reconsidered

    L’individualità del parlante nelle scienze fonetiche: applicazioni tecnologiche e forensi

    Full text link

    Session varaibility compensation in automatic speaker and language recognition

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 201

    Artificial Intelligence and Pattern Evidence: A Legal Application for AI

    Get PDF
    Artificial intelligence changes everything, and almost no jobs will be immune. The application of AI to the practice of law is well-known and well-understood. In this paper, we present some aspects of the related disciplines of forensic science and specifically the development and analysis of “pattern [and impression] evidence.” We show that pattern evidence has a great need for AI.We discuss several applications in detail but focus mostly on the application of AI-based text analysis technology to forensic linguistics.Sociedad Argentina de Informática e Investigación Operativ

    The definition of the relevant population and the collection of data for likelihood ratio-based forensic voice comparison

    Get PDF
    Within the field of forensic speech science there is increasing acceptance of the likelihood ratio (LR) as the logically and legally correct framework for evaluating forensic voice comparison (FVC) evidence. However, only a small proportion of experts cur- rently use the numerical LR in casework. This is due primarily to the difficulties involved in accounting for the inherent, and arguably unique, complexity of speech in a fully data-driven, numerical LR analysis. This thesis addresses two such issues: the definition of the relevant population and the amount of data required for system testing. Firstly, experiments are presented which explore the extent to which LRs are affected by different definitions of the relevant population with regard to sources of systematic sociolinguistic between-speaker variation (regional background, socio-economic class and age) using both linguistic-phonetic and ASR variables. Results show that different definitions of the relevant population can have a substantial effect on the magnitude of LRs, depending on the input variable. However, system validity results suggest that narrow controls over sociolinguistic sources of variation should be preferred to general controls. Secondly, experiments are presented which evaluate the effects of development, test and reference sample size on LRs. Consistent with general principles in statistics, more precise results are found using more data across all experiments. There is also considerable evidence of a relationship between sample size sensitivity and the dimensionality and speaker discriminatory power of the input variable. Further, there are potential trade-offs in the size of each set depending on which element of LR output the analyst is interested in. The results in this thesis will contribute towards im- proving the extent to which LR methods account for the linguistic-phonetic complexity of speech evidence. In accounting for this complexity, this work will also increase the practical viability of applying the numerical LR to FVC casework
    corecore