9 research outputs found
Análisis de señales acústicas para dispositivos de personas con discapacidad, aplicando teorÃa de procesamiento de radar
El presente artÃculo establece las bases para la implementación de un arreglo de micrófonos empleando la teorÃa LCMV y la técnica de Beamforming, para evaluar el mejor algoritmo que existe dentro de las diferentes familias, empleando como medidas de desempeño el error cuadrático medio y las propiedades de los algoritmos LCMV, además de configurar en diferentes estructuras para verificar la versatilidad de los mismo dentro de diferentes ambientes de trabajo u aplicaciones. Con la idea de una implementación de bajo costo para personas con capacidades especiales en cuanto a la pérdida del sentido de audición y mejorar su vida para incluirlos en la sociedad en general.Eje: Procesamiento de Señales y Sistemas de Tiempo RealRed de Universidades con Carreras en Informática (RedUNCI
Análisis de señales acústicas para dispositivos de personas con discapacidad, aplicando teorÃa de procesamiento de radar
El presente artÃculo establece las bases para la implementación de un arreglo de micrófonos empleando la teorÃa LCMV y la técnica de Beamforming, para evaluar el mejor algoritmo que existe dentro de las diferentes familias, empleando como medidas de desempeño el error cuadrático medio y las propiedades de los algoritmos LCMV, además de configurar en diferentes estructuras para verificar la versatilidad de los mismo dentro de diferentes ambientes de trabajo u aplicaciones. Con la idea de una implementación de bajo costo para personas con capacidades especiales en cuanto a la pérdida del sentido de audición y mejorar su vida para incluirlos en la sociedad en general.Eje: Procesamiento de Señales y Sistemas de Tiempo RealRed de Universidades con Carreras en Informática (RedUNCI
Análisis de señales acústicas para dispositivos de personas con discapacidad, aplicando teorÃa de procesamiento de radar
El presente artÃculo establece las bases para la implementación de un arreglo de micrófonos empleando la teorÃa LCMV y la técnica de Beamforming, para evaluar el mejor algoritmo que existe dentro de las diferentes familias, empleando como medidas de desempeño el error cuadrático medio y las propiedades de los algoritmos LCMV, además de configurar en diferentes estructuras para verificar la versatilidad de los mismo dentro de diferentes ambientes de trabajo u aplicaciones. Con la idea de una implementación de bajo costo para personas con capacidades especiales en cuanto a la pérdida del sentido de audición y mejorar su vida para incluirlos en la sociedad en general.Eje: Procesamiento de Señales y Sistemas de Tiempo RealRed de Universidades con Carreras en Informática (RedUNCI
An investigation into variability conditions in the SRE 2004 and 2008 Corpora
In Automatic Speaker Verification, a computer must detemine if a certain speech segment was spoken by a target speaker from whom speech had been previously provided. Speech segments are taken over many conditions such as different telephones, microphones, languages, and dialects. Differences in these conditions result in a variability that can both negatively and positively affect the performance of speaker recognition systems. While the error rates are sometimes unpredictable, the large differences between the error rates of different conditions provokes interest in ways to normalize speech segments to compensate for this variability. With a compensation technique, the error rates should decrease and become more consistent between the different conditions used to record them. The majority of research in the speaker recognition community focuses on techniques to reduce the effects of variability without analyzing what factors actually affect performance the most. To show the need for a form of variabiality compensation in speaker recognition as well as to determine the types of variability factors that most significantly influence performance, a speaker recognition system without any compensation techniques was formed and tested on the core conditions of NIST’s Speaker Recognition Evaluations (SREs) 2004 and 2008. These two datasets are from a series of datasets that organizations in the speaker recognition community use most often to show performance for their speaker verification system. The false alarm and missed detection rates for individual training and target conditions were analyzed at the equal error point over each dataset. The experiments show that language plays a significant role in affecting the performance; however, dialect does not appear to have any influence at all. Consistently, English was proven to provide the best results for speaker recognition with baseline systems of the form utilized in this thesis. While there does not seem to be a single best phone and microphone for speaker recognition systems, consistent performance could be seen when the type of phone and microphone used is the same for both training and testing (matched) and when they are different (mismatched). Higher missed detection rates could be seen in mismatched conditions and higher false alarm rates could be seen in matched conditions. Interview speech was also found to have a much higher difference between false alarm and missed detection than phone speech. The thesis culminates with an in-depth of the error performance as a function of these and other various variability factors.M.S., Electrical Engineering -- Drexel University, 201
Die Rolle phonetischer Information in der Sprechererkennung
Die gesprochene Sprache enthält neben den phonetischen bzw. lexikalischen Informationen,
die den Inhalt einer Äußerung ausmachen, auch Informationen über den Sprecher. Beide
Informationstypen interagieren miteinander, was dazu führt, dass manche Segmente mehr
Informationen über einen Sprecher enthalten als andere und dass Wissen über den Sprecher
dabei helfen kann, die phonetischen Informationen besser zu verarbeiten und somit eine
Äußerung besser zu verstehen. Außerdem stellt sich die Frage, wie diese Informationen
im Hinblick auf ein Sprachwahrnehmungsmodell (abstraktionistisch vs. exemplarbasiert)
integriert werden.
Von diesem Stand ausgehend wird in dieser Arbeit der Einfluss der Segmente, insbesondere
der Konsonanten, auf die Sprecherdiskrimination bzw. -identifikation untersucht. Dafür
werden zunächst einige akustische Merkmale ausgewählter Konsonanten des Deutschen in
einem Sprachkorpus analysiert. Es werden die ersten vier spektralen Momente der Laute
gemessen und deren Sprecherspezifität bestimmt. Vor allem die Nasale /m/ und /n/ sowie
die Frikative /f/ und /s/ offenbarten viele sprecherspezifische Merkmale.
Aufgrund der Annahme, dass sich diese akustisch gemessenen Merkmale auch perzeptiv
in irgendeiner Form manifestieren müssen, wurde ein Sprecherdiskriminationsexperiment
mit Hörern durchgeführt. In beiden Experimenten war das Sprachmaterial eine /aKa/-
Sequenz. Im ersten Experiment enthielt der gesamte Stimulus Sprecherinformationen,
während im zweiten Experiment nur der (statische Teil vom) Konsonant, aber nicht die
Vokaletransitionen Sprecherinformationen enthielt. In beiden Untersuchungen zeigen sich
Unterschiede in der Sprecherspezifität zwischen den verschiedenen Artikulationsmodi und
-stellen, wobei die durchschnittliche Sprecherdiskriminationsrate im zweiten Experiment
deutlich geringer ist als im ersten. Die Ergebnisse lassen darauf schließen, dass Nasale und
Plosive viele ihrer Informationen in den Vokaltransitionen enthalten, während die Frikative
mehr Informationen im (statischen Bereich des) Konsonanten besitzen.
Da die phonetischen und Sprecherinformationen miteinander interagieren, wurde im letzten
Teil der Arbeit die zeitliche Koordination der Verarbeitung beider Informationstypen
mittels eines Visual-World Eye-Tracking Experiments untersucht. Die Ergebnisse zeigen,
dass die Hörer das Target mit großer Sicherheit identifizierten, aber dass mit steigender
Anzahl an Sprechern (2 vs. 4 Sprecher) die Schwierigkeit der Targetidentifikation steigt. Im
Fall von verschieden geschlechtlichen Sprechern wird zuerst das Geschlecht und dann der
einzelne Sprecher erkannt. Außerdem wird nachgewiesen, dass die Sprecherinformationen
tendenziell sogar früher verarbeitet werden als die phonetischen Informationen und selbst
dann Verwendung finden, wenn phonetische Informationen allein zur Targetidentifikation
ausreichend sind. In phonetisch ambigen Fällen werden die Sprecherinformationen verwendet,
um diese Ambiguität zu verringern. Die Ergebnisse unterstreichen die Bedeutung von
Sprecherinformationen in der Verarbeitung gesprochener Sprache und sprechen somit
eher für ein episodisches, exemplarbasiertes Modell der Sprachwahrnehmung, welches
Sprecherinformationen bereits zu einem frühen Zeitpunkt im Sprachverarbeitungsprozess
integriert
Un réseau de neurones à décharges pour la reconnaissance de processus spatio-temporels
Traitement des processus dynamiques non stationnaires dans les réseaux de neurones -- Traitement de l'information dans les systèmes nerveux biologiques -- Modèle du réseau de neurones à décharges -- Modèle du neuronne -- Architecture et apprentissage -- Activité d'auto-organisation -- Application à la reconnaisance des chiffres bruités -- Réseau avec mécanisme de > avec récompense -- Traitement des séquences temporelles et détection de mouvement -- Traitement des séquences temporelles -- Détection de mouvement -- Prototype pour un système d'identification du locuteur à l'aide du réseau proposé -- Analyse de la parole par modulation d'amplitude dans le système auditif -- Système d'identification du locuteur -- Traitement des enveloppes par le réseau proposé -- Identification du locuteur basée sur les paramètres de sortie du réseau proposé
Robust text independent closed set speaker identification systems and their evaluation
PhD ThesisThis thesis focuses upon text independent closed set speaker
identi cation. The contributions relate to evaluation studies in the
presence of various types of noise and handset e ects. Extensive
evaluations are performed on four databases.
The rst contribution is in the context of the use of the Gaussian
Mixture Model-Universal Background Model (GMM-UBM) with
original speech recordings from only the TIMIT database. Four main
simulations for Speaker Identi cation Accuracy (SIA) are presented
including di erent fusion strategies: Late fusion (score based), early
fusion (feature based) and early-late fusion (combination of feature and
score based), late fusion using concatenated static and dynamic
features (features with temporal derivatives such as rst order
derivative delta and second order derivative delta-delta features,
namely acceleration features), and nally fusion of statistically
independent normalized scores.
The second contribution is again based on the GMM-UBM
approach. Comprehensive evaluations of the e ect of Additive White
Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and
without a G.712 type handset) upon identi cation performance are
undertaken. In particular, three NSN types with varying Signal to
Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus
interior and a crowded talking environment. The performance
evaluation also considered the e ect of late fusion techniques based on
score fusion, namely mean, maximum, and linear weighted sum fusion.
The databases employed were: TIMIT, SITW, and NIST 2008; and 120
speakers were selected from each database to yield 3,600 speech
utterances.
The third contribution is based on the use of the I-vector, four
combinations of I-vectors with 100 and 200 dimensions were employed.
Then, various fusion techniques using maximum, mean, weighted sum
and cumulative fusion with the same I-vector dimension were used to
improve the SIA. Similarly, both interleaving and concatenated I-vector
fusion were exploited to produce 200 and 400 I-vector dimensions. The
system was evaluated with four di erent databases using 120 speakers
from each database. TIMIT, SITW and NIST 2008 databases were
evaluated for various types of NSN namely, street-tra c NSN,
bus-interior NSN and crowd talking NSN; and the G.712 type handset
at 16 kHz was also applied.
As recommendations from the study in terms of the GMM-UBM
approach, mean fusion is found to yield overall best performance in terms
of the SIA with noisy speech, whereas linear weighted sum fusion is
overall best for original database recordings. However, in the I-vector
approach the best SIA was obtained from the weighted sum and the
concatenated fusion.Ministry of Higher Education
and Scienti c Research (MoHESR), and the Iraqi Cultural Attach e,
Al-Mustansiriya University, Al-Mustansiriya University College of
Engineering in Iraq for supporting my PhD scholarship
The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems
This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates
WICC 2016 : XVIII Workshop de Investigadores en Ciencias de la Computación
Actas del XVIII Workshop de Investigadores en Ciencias de la Computación (WICC 2016), realizado en la Universidad Nacional de Entre RÃos, el 14 y 15 de abril de 2016.Red de Universidades con Carreras en Informática (RedUNCI