Search CORE

9 research outputs found

Análisis de señales acústicas para dispositivos de personas con discapacidad, aplicando teoría de procesamiento de radar

Author: Bernal Paúl
Romero Carlos
Sáenz Fabián
Publication venue
Publication date: 03/06/2016
Field of study

El presente artículo establece las bases para la implementación de un arreglo de micrófonos empleando la teoría LCMV y la técnica de Beamforming, para evaluar el mejor algoritmo que existe dentro de las diferentes familias, empleando como medidas de desempeño el error cuadrático medio y las propiedades de los algoritmos LCMV, además de configurar en diferentes estructuras para verificar la versatilidad de los mismo dentro de diferentes ambientes de trabajo u aplicaciones. Con la idea de una implementación de bajo costo para personas con capacidades especiales en cuanto a la pérdida del sentido de audición y mejorar su vida para incluirlos en la sociedad en general.Eje: Procesamiento de Señales y Sistemas de Tiempo RealRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Análisis de señales acústicas para dispositivos de personas con discapacidad, aplicando teoría de procesamiento de radar

Author: Bernal Paúl
Romero Carlos
Sáenz Fabián
Publication venue
Publication date: 01/04/2016
Field of study

Servicio de Difusión de la Creación Intelectual

Análisis de señales acústicas para dispositivos de personas con discapacidad, aplicando teoría de procesamiento de radar

Author: Bernal Paúl
Romero Carlos
Sáenz Fabián
Publication venue
Publication date: 01/04/2016
Field of study

An investigation into variability conditions in the SRE 2004 and 2008 Corpora

Author: Cinciruk David A.
Publication venue: Drexel University
Publication date
Field of study

In Automatic Speaker Verification, a computer must detemine if a certain speech segment was spoken by a target speaker from whom speech had been previously provided. Speech segments are taken over many conditions such as different telephones, microphones, languages, and dialects. Differences in these conditions result in a variability that can both negatively and positively affect the performance of speaker recognition systems. While the error rates are sometimes unpredictable, the large differences between the error rates of different conditions provokes interest in ways to normalize speech segments to compensate for this variability. With a compensation technique, the error rates should decrease and become more consistent between the different conditions used to record them. The majority of research in the speaker recognition community focuses on techniques to reduce the effects of variability without analyzing what factors actually affect performance the most. To show the need for a form of variabiality compensation in speaker recognition as well as to determine the types of variability factors that most significantly influence performance, a speaker recognition system without any compensation techniques was formed and tested on the core conditions of NIST’s Speaker Recognition Evaluations (SREs) 2004 and 2008. These two datasets are from a series of datasets that organizations in the speaker recognition community use most often to show performance for their speaker verification system. The false alarm and missed detection rates for individual training and target conditions were analyzed at the equal error point over each dataset. The experiments show that language plays a significant role in affecting the performance; however, dialect does not appear to have any influence at all. Consistently, English was proven to provide the best results for speaker recognition with baseline systems of the form utilized in this thesis. While there does not seem to be a single best phone and microphone for speaker recognition systems, consistent performance could be seen when the type of phone and microphone used is the same for both training and testing (matched) and when they are different (mismatched). Higher missed detection rates could be seen in mismatched conditions and higher false alarm rates could be seen in matched conditions. Interview speech was also found to have a much higher difference between false alarm and missed detection than phone speech. The thesis culminates with an in-depth of the error performance as a function of these and other various variability factors.M.S., Electrical Engineering -- Drexel University, 201

Drexel Libraries E-Repository and Archives

Die Rolle phonetischer Information in der Sprechererkennung

Author: Schindler Carola
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 29/01/2016
Field of study

Die gesprochene Sprache enthält neben den phonetischen bzw. lexikalischen Informationen, die den Inhalt einer Äußerung ausmachen, auch Informationen über den Sprecher. Beide Informationstypen interagieren miteinander, was dazu führt, dass manche Segmente mehr Informationen über einen Sprecher enthalten als andere und dass Wissen über den Sprecher dabei helfen kann, die phonetischen Informationen besser zu verarbeiten und somit eine Äußerung besser zu verstehen. Außerdem stellt sich die Frage, wie diese Informationen im Hinblick auf ein Sprachwahrnehmungsmodell (abstraktionistisch vs. exemplarbasiert) integriert werden. Von diesem Stand ausgehend wird in dieser Arbeit der Einfluss der Segmente, insbesondere der Konsonanten, auf die Sprecherdiskrimination bzw. -identifikation untersucht. Dafür werden zunächst einige akustische Merkmale ausgewählter Konsonanten des Deutschen in einem Sprachkorpus analysiert. Es werden die ersten vier spektralen Momente der Laute gemessen und deren Sprecherspezifität bestimmt. Vor allem die Nasale /m/ und /n/ sowie die Frikative /f/ und /s/ offenbarten viele sprecherspezifische Merkmale. Aufgrund der Annahme, dass sich diese akustisch gemessenen Merkmale auch perzeptiv in irgendeiner Form manifestieren müssen, wurde ein Sprecherdiskriminationsexperiment mit Hörern durchgeführt. In beiden Experimenten war das Sprachmaterial eine /aKa/- Sequenz. Im ersten Experiment enthielt der gesamte Stimulus Sprecherinformationen, während im zweiten Experiment nur der (statische Teil vom) Konsonant, aber nicht die Vokaletransitionen Sprecherinformationen enthielt. In beiden Untersuchungen zeigen sich Unterschiede in der Sprecherspezifität zwischen den verschiedenen Artikulationsmodi und -stellen, wobei die durchschnittliche Sprecherdiskriminationsrate im zweiten Experiment deutlich geringer ist als im ersten. Die Ergebnisse lassen darauf schließen, dass Nasale und Plosive viele ihrer Informationen in den Vokaltransitionen enthalten, während die Frikative mehr Informationen im (statischen Bereich des) Konsonanten besitzen. Da die phonetischen und Sprecherinformationen miteinander interagieren, wurde im letzten Teil der Arbeit die zeitliche Koordination der Verarbeitung beider Informationstypen mittels eines Visual-World Eye-Tracking Experiments untersucht. Die Ergebnisse zeigen, dass die Hörer das Target mit großer Sicherheit identifizierten, aber dass mit steigender Anzahl an Sprechern (2 vs. 4 Sprecher) die Schwierigkeit der Targetidentifikation steigt. Im Fall von verschieden geschlechtlichen Sprechern wird zuerst das Geschlecht und dann der einzelne Sprecher erkannt. Außerdem wird nachgewiesen, dass die Sprecherinformationen tendenziell sogar früher verarbeitet werden als die phonetischen Informationen und selbst dann Verwendung finden, wenn phonetische Informationen allein zur Targetidentifikation ausreichend sind. In phonetisch ambigen Fällen werden die Sprecherinformationen verwendet, um diese Ambiguität zu verringern. Die Ergebnisse unterstreichen die Bedeutung von Sprecherinformationen in der Verarbeitung gesprochener Sprache und sprechen somit eher für ein episodisches, exemplarbasiertes Modell der Sprachwahrnehmung, welches Sprecherinformationen bereits zu einem frühen Zeitpunkt im Sprachverarbeitungsprozess integriert

Un réseau de neurones à décharges pour la reconnaissance de processus spatio-temporels

Author: Ho Tuong Vinh
Publication venue
Publication date: 01/01/1999
Field of study

Traitement des processus dynamiques non stationnaires dans les réseaux de neurones -- Traitement de l'information dans les systèmes nerveux biologiques -- Modèle du réseau de neurones à décharges -- Modèle du neuronne -- Architecture et apprentissage -- Activité d'auto-organisation -- Application à la reconnaisance des chiffres bruités -- Réseau avec mécanisme de > avec récompense -- Traitement des séquences temporelles et détection de mouvement -- Traitement des séquences temporelles -- Détection de mouvement -- Prototype pour un système d'identification du locuteur à l'aide du réseau proposé -- Analyse de la parole par modulation d'amplitude dans le système auditif -- Système d'identification du locuteur -- Traitement des enveloppes par le réseau proposé -- Identification du locuteur basée sur les paramètres de sortie du réseau proposé

PolyPublie

Robust text independent closed set speaker identification systems and their evaluation

Author: Al-Kaltakchi Musab Tahseen Salahaldeen.
Publication venue: Newcastle University
Publication date: 01/01/2018
Field of study

PhD ThesisThis thesis focuses upon text independent closed set speaker identi cation. The contributions relate to evaluation studies in the presence of various types of noise and handset e ects. Extensive evaluations are performed on four databases. The rst contribution is in the context of the use of the Gaussian Mixture Model-Universal Background Model (GMM-UBM) with original speech recordings from only the TIMIT database. Four main simulations for Speaker Identi cation Accuracy (SIA) are presented including di erent fusion strategies: Late fusion (score based), early fusion (feature based) and early-late fusion (combination of feature and score based), late fusion using concatenated static and dynamic features (features with temporal derivatives such as rst order derivative delta and second order derivative delta-delta features, namely acceleration features), and nally fusion of statistically independent normalized scores. The second contribution is again based on the GMM-UBM approach. Comprehensive evaluations of the e ect of Additive White Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and without a G.712 type handset) upon identi cation performance are undertaken. In particular, three NSN types with varying Signal to Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus interior and a crowded talking environment. The performance evaluation also considered the e ect of late fusion techniques based on score fusion, namely mean, maximum, and linear weighted sum fusion. The databases employed were: TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3,600 speech utterances. The third contribution is based on the use of the I-vector, four combinations of I-vectors with 100 and 200 dimensions were employed. Then, various fusion techniques using maximum, mean, weighted sum and cumulative fusion with the same I-vector dimension were used to improve the SIA. Similarly, both interleaving and concatenated I-vector fusion were exploited to produce 200 and 400 I-vector dimensions. The system was evaluated with four di erent databases using 120 speakers from each database. TIMIT, SITW and NIST 2008 databases were evaluated for various types of NSN namely, street-tra c NSN, bus-interior NSN and crowd talking NSN; and the G.712 type handset at 16 kHz was also applied. As recommendations from the study in terms of the GMM-UBM approach, mean fusion is found to yield overall best performance in terms of the SIA with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings. However, in the I-vector approach the best SIA was obtained from the weighted sum and the concatenated fusion.Ministry of Higher Education and Scienti c Research (MoHESR), and the Iraqi Cultural Attach e, Al-Mustansiriya University, Al-Mustansiriya University College of Engineering in Iraq for supporting my PhD scholarship

Newcastle University eTheses

The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

Author: Nash John
Publication venue: University of York
Publication date: 01/09/2019
Field of study

This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

White Rose E-theses Online

WICC 2016 : XVIII Workshop de Investigadores en Ciencias de la Computación

Author: Red de Universidades con Carreras en Informática (RedUNCI)
Publication venue: Facultad de Ciencias de la Administración (UNER)
Publication date: 12/05/2016
Field of study

Actas del XVIII Workshop de Investigadores en Ciencias de la Computación (WICC 2016), realizado en la Universidad Nacional de Entre Ríos, el 14 y 15 de abril de 2016.Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual