371 research outputs found

    Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010

    Get PDF
    Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use

    Application of Automatic Speaker Recognition techniques to pathological voice assessment (dysphonia)

    No full text
    International audienceThis paper investigates the adaptation of Automatic Speaker Recognition (ASR) techniques to the pathological voice assessment (dysphonic voices). The aim of this study is to provide a novel method, suitable for keeping track of the evolution of the patient's pathology: easy-to-use, fast, non-invasive for the patient, and affordable for the clinicians. This method will be complementary to the existing ones - the perceptual judgment and the usual objective measurement (jitter, airflows...) which remain time and human resource consuming. The system designed for this particular task relies on the GMMbased approach, which is the state-of-the-art for speaker recognition. It is derived from the open source ASR tools (LIA_Spk- Det and ALIZE) of the LIA lab.Experiments conducted on a dysphonic corpus provide promising results, underlining the interest of such an approach and opening further research investigation

    The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016

    Get PDF
    The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4 continents. The joint submission and several of its 32 sub-systems were among top-performing systems. A lot of efforts have been devoted to two major challenges, namely, unlabeled training data and dataset shift from Switchboard-Mixer to the new Call My Net dataset. This paper summarizes the lessons learned, presents our shared view from the sixteen research groups on recent advances, major paradigm shift, and common tool chain used in speaker recognition as we have witnessed in SRE'16. More importantly, we look into the intriguing question of fusing a large ensemble of sub-systems and the potential benefit of large-scale collaboration.Peer reviewe

    Speaker tracking system using speaker boundary detection

    Get PDF
    This thesis is about a research conducted in the area of Speaker Recognition. The application is concerned to the automatic detection and tracking of target speakers in meetings, conferences, telephone conversations and in radio and television broadcasts. A Speaker Tracking system is developed here, in collaboration with the Center for Language and Speech Technologies and Applications (TALP) in UPC. The main objective of this Speaker Tracking system is to answer the question: When the target speaker speaks? The system uses training speech data for the target speaker in the pre-enrollment stage. Three main modules have been designed for this Speaker Tracking system. In the first module an energy based Speech Activity Detection is applied to select the speech parts of the audio. In the second module the audio is segmented according to the speaker turning points. In the last module a Speaker Verification is implemented in which the target speakers are verified and tracked. Two different approaches are applied in this last module. In the first approach for Speaker Verification, the target speakers and the segments are modeled using the state-of-the-art, Gaussian Mixture Models (GMM). In the second approach for Speaker Verification, the identity vectors (i-vectors) representation is applied for the target speakers and the segments. Finally, the performance of both these approaches is compared for the results evaluation

    Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition

    Get PDF
    In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    UPC multimodal speaker diarization system for the 2018 Albayzin challenge

    Get PDF
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is then used to compare speech segments with enrollment speaker targets using cosine distance between the embeddings. To detect identities from the face modality, a face detector followed by a face tracker has been used on the videos. For each cropped face a feature vector is obtained using a Deep Neural Network based on the ResNet 34 architecture, trained using a metric learning triplet loss (available from dlib library). For each track the face feature vector is obtained by averaging the features obtained for each one of the frames of that track. Then, this feature vector is compared with the features extracted from the images of the enrollment identities. The proposed system is evaluated on the RTVE2018 database.Peer ReviewedPostprint (published version

    Evaluation of the ALIZE / LIA_RAL Speaker Verification Toolkit on an Embedded System

    Get PDF
    La verificación de locutor independiente del texto es la acción de validar la identidad de un usuario usando únicamente características extraídas de su voz, sin tener en cuenta el texto pronunciado. Hoy en día, multitud de software de verificación de locutor ha sido implementado para funcionar en ordenadores personales, pero usar estas aplicaciones en sistemas embedidos (Smartphones, teléfonos, ordenadores integrados) multiplica su potencial en campos como la seguridad, el sector del automóvil u otras aplicaciones de entretenimiento. La comprensión teórica de los sistemas dText-independent speaker verification is the computing task of verifying a user's claimed identity using only characteristics extracted from their voices, regardless of the spoken text. Nowadays, a lot of speaker verification applications are being implemented in software, and using these systems on embedded systems (PDAs, cell phones, integrated computers) multiplies their potential in security, automotive, or entertainment applications, among others. Comprehension of speaker verification requires a knowledge of voice processing and a high mathematical level. Embedded system performance is not the same as offered by a workstation. So, in-depth knowledge of the target platform where the system will be implemented and about cross-compilation tools necessary to adapt the software to the new platform is required, too. Also execution time and memory requirements have to be taken into account to get a good quality of speaker verification. In this thesis we evaluate the performance and viability of a speaker verification software on an embedded system. We present a comprehensive study of the toolkit and the target embedded system. The verification system used in this thesis is the ALIZE / LIA_RAL Toolkit. This software is able to recognize the identity of a client previously trained in a database, and works independently of the text spoken. We have tested the toolkit on a 32-bit RISC ARM architecture set computer. We expect the toolkit can be ported to comparable embedded system with a reasonable effort. The findings confirm that the speaker verification results on work station are comparable than in an embedded system. However, time and memory requirements are not the same in both platforms. Taking into account these results, we propose an optimization in the speaker verification test to reduce resource requirements.La verificación de locutor independiente del texto es la acción de validar la identidad de un usuario usando únicamente características extraídas de su voz, sin tener en cuenta el texto pronunciado. Hoy en día, multitud de software de verificación de locutor ha sido implementado para funcionar en ordenadores personales, pero usar estas aplicaciones en sistemas embedidos (Smartphones, teléfonos, ordenadores integrados) multiplica su potencial en campos como la seguridad, el sector del automóvil u otras aplicaciones de entretenimiento. La comprensión teórica de los sistemas de verificación de locutor requiere conocimientos de procesado de voz y un nivel alto de matemática algorítmica. El rendimiento de estos sistemas embedidos no es el mismo que los que ofrecen los ordenadores personales, así que hace falta un conocimiento exhaustivo de la plataforma en la cual se va a integrar la aplicación, así como un conocimiento de las herramientas de compilación cruzadas necesarias para adaptar el software a la nueva plataforma. Los requerimientos de tiempo y memoria también deben ser tenidos en cuenta para garantizar una buena calidad de verificación. En este proyecto, se evaluará el rendimiento y la viabilidad de un sistema de verificación de locutor integrado en un sistema embedido. Se presenta un estudio exhaustivo de las herramientas del software, así como de la plataforma de destino utilizada. El sistema de verificación usado en este proyecto ha sido la herramienta ALIZE / LIA_RAL. Este software es capaz de reconocer la identidad de un cliente entrenado con anterioridad y almacenado en una base de datos, y trabaja independientemente del texto pronunciado. El software ha sido testado en una máquina de pruebas con un procesador de 32-bit RISC ARM, pero el sistema podría ser portado a otros sistemas sin problemas añadidos . Los hallazgos durante el proyecto confirman que los resultados de la verificación en un sistema embedido son similares a los obtenidos en el PC. Sin embargo, los requerimientos de tiempo y memoria no son los mismos en las dos plataformas. Teniendo en cuenta estos resultados, se propone una optimización de los parámetros de configuración utilizados en el proceso de test para reducir considerablemente los recursos utilizados.La verificació de locutor independent del text és l'acció de validar la identitat d'un usuari usant únicament característiques extretes de la seva veu, sense tenir en compte el text pronunciat. Avui en dia, multitud de programes de verificació de locutor han estat implementats per funcionar en ordinadors personals, però usar aquestes aplicacions en sistemes integrats (Smartphones, telèfons, ordinadors integrats) multiplica el seu potencial en camps com la seguretat, el sector de l'automòbil o altres aplicacions d'entreteniment. La comprensió teòrica dels sistemes de verificació de locutor requereix coneixements de processament de veu i un nivell alt de matemàtica algorísmica. El rendiment d'aquests sistemes integrats no és el mateix que els que ofereixen els ordinadors personals, així que cal un coneixement exhaustiu de la plataforma en la qual es va a integrar l'aplicació, així com un coneixement de les eines de compilació creuades necessàries per adaptar el programari a la nova plataforma. Els requeriments de temps i memòria també s'han de tenir en compte per garantir una bona qualitat de verificació. En aquest projecte, s'avaluarà el rendiment i la viabilitat d'un sistema de verificació de locutor integrat en un sistema incrustat. Es presenta un estudi exhaustiu de les eines del programari, així com de la plataforma de destinació utilitzada. El sistema de verificació usat en aquest projecte ha estat l'eina ALIZE / LIA_RAL. Aquest programari és capaç de reconèixer la identitat d'un client entrenat amb anterioritat i emmagatzemat en una base de dades, i treballa independentment del text pronunciat. El programari ha estat testat en una màquina de proves amb un processador de 32 bits RISC ARM, però el sistema podria portar a altres sistemes sense problemes afegits Les troballes durant el projecte confirmen que els resultats de la verificació en un sistema integrat són similars als obtinguts al PC. No obstant això, els requeriments de temps i memòria no són els mateixos en les dues plataformes. Tenint en compte aquests resultats, es proposa una optimització dels paràmetres de configuració utilitzats en el procés de test per reduir considerablement els recursos utilitzats
    corecore