25 research outputs found
T-Norm y desajuste léxico y acústico en reconocimiento de locutor dependiente de texto
Actas de las V Jornadas en Tecnología del Habla (JTH 2008)Este trabajo presenta un estudio extenso sobre T-norm aplicado a Reconocimiento de Locutor Dependiente de Texto, analizando también los problemas del desajuste léxico y acústico. Veremos cómo varían los resultados teniendo en cuenta la dependencia de género y realizando T-norm a nivel de frase, fonema y estado con cohortes de impostores de distintos tamaños. El estudio demuestra que implementar T-norm por fonema o estado puede llegar a conseguir mejoras relativas de hasta un 16% y que realizar una selección de cohorte basada en el género puede mejorar más aún los
resultados con respecto al caso independiente de género
Speaker verification using sequence discriminant support vector machines
This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system
Phoneme and Sub-Phoneme T-Normalization for Text-Dependent Speaker Recognition
Test normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or textprompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm for text-dependent systems. It consists in applying score TNormalization at the phoneme or sub-phoneme level instead of at the sentence level. Experiments on the YOHO corpus show that, while using standard sentence-level T-Norm does not improve equal error rate (EER), phoneme and sub-phoneme level T-Norm produce a relative EER reduction of 18.9% and 20.1% respectively on a state-of-the-art HMM based textdependent speaker recognition system. Results are even better for working points with low false acceptance rates
Enabling Massive Deep Neural Networks with the GraphBLAS
Deep Neural Networks (DNNs) have emerged as a core tool for machine learning.
The computations performed during DNN training and inference are dominated by
operations on the weight matrices describing the DNN. As DNNs incorporate more
stages and more nodes per stage, these weight matrices may be required to be
sparse because of memory limitations. The GraphBLAS.org math library standard
was developed to provide high performance manipulation of sparse weight
matrices and input/output vectors. For sufficiently sparse matrices, a sparse
matrix library requires significantly less memory than the corresponding dense
matrix implementation. This paper provides a brief description of the
mathematics underlying the GraphBLAS. In addition, the equations of a typical
DNN are rewritten in a form designed to use the GraphBLAS. An implementation of
the DNN is given using a preliminary GraphBLAS C library. The performance of
the GraphBLAS implementation is measured relative to a standard dense linear
algebra library implementation. For various sizes of DNN weight matrices, it is
shown that the GraphBLAS sparse implementation outperforms a BLAS dense
implementation as the weight matrix becomes sparser.Comment: 10 pages, 7 figures, to appear in the 2017 IEEE High Performance
Extreme Computing (HPEC) conferenc
AI Enabled Maneuver Identification via the Maneuver Identification Challenge
Artificial intelligence (AI) has enormous potential to improve Air Force
pilot training by providing actionable feedback to pilot trainees on the
quality of their maneuvers and enabling instructor-less flying familiarization
for early-stage trainees in low-cost simulators. Historically, AI challenges
consisting of data, problem descriptions, and example code have been critical
to fueling AI breakthroughs. The Department of the Air Force-Massachusetts
Institute of Technology AI Accelerator (DAF-MIT AI Accelerator) developed such
an AI challenge using real-world Air Force flight simulator data. The Maneuver
ID challenge assembled thousands of virtual reality simulator flight recordings
collected by actual Air Force student pilots at Pilot Training Next (PTN). This
dataset has been publicly released at Maneuver-ID.mit.edu and represents the
first of its kind public release of USAF flight training data. Using this
dataset, we have applied a variety of AI methods to separate "good" vs "bad"
simulator data and categorize and characterize maneuvers. These data,
algorithms, and software are being released as baselines of model performance
for others to build upon to enable the AI ecosystem for flight simulator
training.Comment: 10 pages, 7 figures, 4 tables, accepted to and presented at I/ITSE