2,671 research outputs found
Domain Adaptation for Statistical Classifiers
The most basic assumption used in statistical learning theory is that
training data and test data are drawn from the same underlying distribution.
Unfortunately, in many applications, the "in-domain" test data is drawn from a
distribution that is related, but not identical, to the "out-of-domain"
distribution of the training data. We consider the common case in which labeled
out-of-domain data is plentiful, but labeled in-domain data is scarce. We
introduce a statistical formulation of this problem in terms of a simple
mixture model and present an instantiation of this framework to maximum entropy
classifiers and their linear chain counterparts. We present efficient inference
algorithms for this special case based on the technique of conditional
expectation maximization. Our experimental results show that our approach leads
to improved performance on three real world tasks on four different data sets
from the natural language processing domain
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES
Tesis por compendio[ES] Durante los últimos años, los repositorios multimedia en línea se han convertido
en fuentes clave de conocimiento gracias al auge de Internet, especialmente en
el área de la educación. Instituciones educativas de todo el mundo han dedicado
muchos recursos en la búsqueda de nuevos métodos de enseñanza, tanto para
mejorar la asimilación de nuevos conocimientos, como para poder llegar a una
audiencia más amplia. Como resultado, hoy en día disponemos de diferentes
repositorios con clases grabadas que siven como herramientas complementarias en
la enseñanza, o incluso pueden asentar una nueva base en la enseñanza a
distancia. Sin embargo, deben cumplir con una serie de requisitos para que la
experiencia sea totalmente satisfactoria y es aquí donde la transcripción de los
materiales juega un papel fundamental. La transcripción posibilita una búsqueda
precisa de los materiales en los que el alumno está interesado, se abre la
puerta a la traducción automática, a funciones de recomendación, a la
generación de resumenes de las charlas y además, el poder hacer
llegar el contenido a personas con discapacidades auditivas. No obstante, la
generación de estas transcripciones puede resultar muy costosa.
Con todo esto en mente, la presente tesis tiene como objetivo proporcionar
nuevas herramientas y técnicas que faciliten la transcripción de estos
repositorios. En particular, abordamos el desarrollo de un conjunto de herramientas
de reconocimiento de automático del habla, con énfasis en las técnicas de aprendizaje
profundo que contribuyen a proporcionar transcripciones precisas en casos de
estudio reales. Además, se presentan diferentes participaciones en competiciones
internacionales donde se demuestra la competitividad del software comparada con
otras soluciones. Por otra parte, en aras de mejorar los sistemas de
reconocimiento, se propone una nueva técnica de adaptación de estos sistemas al
interlocutor basada en el uso Medidas de Confianza. Esto además motivó el
desarrollo de técnicas para la mejora en la estimación de este tipo de medidas
por medio de Redes Neuronales Recurrentes.
Todas las contribuciones presentadas se han probado en diferentes repositorios
educativos. De hecho, el toolkit transLectures-UPV es parte de un conjunto de
herramientas que sirve para generar transcripciones de clases en diferentes
universidades e instituciones españolas y europeas.[CA] Durant els últims anys, els repositoris multimèdia en línia s'han convertit
en fonts clau de coneixement gràcies a l'expansió d'Internet, especialment en
l'àrea de l'educació. Institucions educatives de tot el món han dedicat
molts recursos en la recerca de nous mètodes d'ensenyament, tant per
millorar l'assimilació de nous coneixements, com per poder arribar a una
audiència més àmplia. Com a resultat, avui dia disposem de diferents
repositoris amb classes gravades que serveixen com a eines complementàries en
l'ensenyament, o fins i tot poden assentar una nova base a l'ensenyament a
distància. No obstant això, han de complir amb una sèrie de requisits perquè la
experiència siga totalment satisfactòria i és ací on la transcripció dels
materials juga un paper fonamental. La transcripció possibilita una recerca
precisa dels materials en els quals l'alumne està interessat, s'obri la
porta a la traducció automàtica, a funcions de recomanació, a la
generació de resums de les xerrades i el poder fer
arribar el contingut a persones amb discapacitats auditives. No obstant, la
generació d'aquestes transcripcions pot resultar molt costosa.
Amb això en ment, la present tesi té com a objectiu proporcionar noves
eines i tècniques que faciliten la transcripció d'aquests repositoris. En
particular, abordem el desenvolupament d'un conjunt d'eines de reconeixement
automàtic de la parla, amb èmfasi en les tècniques d'aprenentatge profund que
contribueixen a proporcionar transcripcions precises en casos d'estudi reals. A
més, es presenten diferents participacions en competicions internacionals on es
demostra la competitivitat del programari comparada amb altres solucions.
D'altra banda, per tal de millorar els sistemes de reconeixement, es proposa una
nova tècnica d'adaptació d'aquests sistemes a l'interlocutor basada en l'ús de
Mesures de Confiança. A més, això va motivar el desenvolupament de tècniques per
a la millora en l'estimació d'aquest tipus de mesures per mitjà de Xarxes
Neuronals Recurrents.
Totes les contribucions presentades s'han provat en diferents repositoris
educatius. De fet, el toolkit transLectures-UPV és part d'un conjunt d'eines
que serveix per generar transcripcions de classes en diferents universitats i
institucions espanyoles i europees.[EN] During the last years, on-line multimedia repositories have become key
knowledge assets thanks to the rise of Internet and especially in the area of
education. Educational institutions around the world have devoted big efforts
to explore different teaching methods, to improve the transmission of knowledge
and to reach a wider audience. As a result, online video lecture repositories
are now available and serve as complementary tools that can boost the learning
experience to better assimilate new concepts. In order to guarantee the success
of these repositories the transcription of each lecture plays a very important
role because it constitutes the first step towards the availability of many other
features. This transcription allows the searchability of learning materials,
enables the translation into another languages, provides recommendation
functions, gives the possibility to provide content summaries, guarantees
the access to people with hearing disabilities, etc. However, the
transcription of these videos is expensive in terms of time and human cost.
To this purpose, this thesis aims at providing new tools and techniques that
ease the transcription of these repositories. In particular, we address the
development of a complete Automatic Speech Recognition Toolkit with an special
focus on the Deep Learning techniques that contribute to provide accurate
transcriptions in real-world scenarios. This toolkit is tested against many
other in different international competitions showing comparable transcription
quality. Moreover, a new technique to improve the recognition accuracy has been
proposed which makes use of Confidence Measures, and constitutes the spark that
motivated the proposal of new Confidence Measures techniques that helped to
further improve the transcription quality. To this end, a new speaker-adapted
confidence measure approach was proposed for models based on Recurrent Neural
Networks.
The contributions proposed herein have been tested in real-life scenarios in
different educational repositories. In fact, the transLectures-UPV toolkit is
part of a set of tools for providing video lecture transcriptions in many
different Spanish and European universities and institutions.Agua Teba, MÁD. (2019). CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130198TESISCompendi
Bernoulli HMMs for Handwritten Text Recognition
In last years Hidden Markov Models (HMMs) have received significant attention in the
task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR),
HMMs are used to model the probability of an observation sequence, given its corresponding
text transcription. However, in contrast to what happens in ASR, in HTR there is no standard
set of local features being used by most of the proposed systems. In this thesis we propose the
use of raw binary pixels as features, in conjunction with models that deal more directly with
the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional
HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli
(mixture) probability functions. The objective is twofold: on the one hand, this allows us
to better modeling the binary nature of text images (foreground/background) using BHMMs.
On the other hand, this guarantees that no discriminative information is filtered out during
feature extraction (most HTR available datasets can be easily binarized without a relevant
loss of information).
In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is
reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple
classifier based on BHMMs with Bernoulli probability functions at the states, and we end
with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the
binary features, we propose a simple binary feature extraction process without significant
loss of information. All input images are scaled and binarized, in order to easily reinterpret
them as sequences of binary feature vectors. Two extensions are proposed to this basic feature
extraction method: the use of a sliding window in order to better capture the context,
and a repositioning method in order to better deal with vertical distortions. Competitive results
were obtained when BHMMs and proposed methods were applied to well-known HTR
databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition
organized during the 12th International Conference on Frontiers in Handwriting Recognition
(ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally
Represented Text organized during the 11th International Conference on Document Analysis
and Recognition (ICDAR 2011).
In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the
task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR),
HMMs are used to model the probability of an observation sequence, given its corresponding
text transcription. However, in contrast to what happens in ASR, in HTR there is no standard
set of local features being used by most of the proposed systems. In this thesis we propose the
use of raw binary pixels as features, in conjunction with models that deal more directly with
the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional
HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli
(mixture) probability functions. The objective is twofold: on the one hand, this allows us
to better modeling the binary nature of text images (foreground/background) using BHMMs.
On the other hand, this guarantees that no discriminative information is filtered out during
feature extraction (most HTR available datasets can be easily binarized without a relevant
loss of information).
In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is
reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple
classifier based on BHMMs with Bernoulli probability functions at the states, and we end
with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the
binary features, we propose a simple binary feature extraction process without significant
loss of information. All input images are scaled and binarized, in order to easily reinterpret
them as sequences of binary feature vectors. Two extensions are proposed to this basic feature
extraction method: the use of a sliding window in order to better capture the context,
and a repositioning method in order to better deal with vertical distortions. Competitive results
were obtained when BHMMs and proposed methods were applied to well-known HTR
databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition
organized during the 12th International Conference on Frontiers in Handwriting Recognition
(ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally
Represented Text organized during the 11th International Conference on Document Analysis
and Recognition (ICDAR 2011).
In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the
task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR),
HMMs are used to model the probability of an observation sequence, given its corresponding
text transcription. However, in contrast to what happens in ASR, in HTR there is no standard
set of local features being used by most of the proposed systems. In this thesis we propose the
use of raw binary pixels as features, in conjunction with models that deal more directly with
the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional
HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli
(mixture) probability functions. The objective is twofold: on the one hand, this allows us
to better modeling the binary nature of text images (foreground/background) using BHMMs.
On the other hand, this guarantees that no discriminative information is filtered out during
feature extraction (most HTR available datasets can be easily binarized without a relevant
loss of information).
In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is
reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple
classifier based on BHMMs with Bernoulli probability functions at the states, and we end
with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the
binary features, we propose a simple binary feature extraction process without significant
loss of information. All input images are scaled and binarized, in order to easily reinterpret
them as sequences of binary feature vectors. Two extensions are proposed to this basic feature
extraction method: the use of a sliding window in order to better capture the context,
and a repositioning method in order to better deal with vertical distortions. Competitive results
were obtained when BHMMs and proposed methods were applied to well-known HTR
databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition
organized during the 12th International Conference on Frontiers in Handwriting Recognition
(ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally
Represented Text organized during the 11th International Conference on Document Analysis
and Recognition (ICDAR 2011).
In the last part of this thesis we propose a method for training BHMM classifiers using discriminative training criteria, instead of the conventionalMaximum Likelihood Estimation
(MLE). Specifically, we propose a log-linear classifier for binary data based on the BHMM
classifier. Parameter estimation of this model can be carried out using discriminative training
criteria for log-linear models. In particular, we show the formulae for several MMI based
criteria. Finally, we prove the equivalence between both classifiers, hence, discriminative
training of a BHMM classifier can be carried out by obtaining its equivalent log-linear classifier.
Reported results show that discriminative BHMMs clearly outperform conventional
generative BHMMs.Giménez Pastor, A. (2014). Bernoulli HMMs for Handwritten Text Recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37978TESI
Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis
<p>Abstract</p> <p>Background</p> <p>One of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions.</p> <p>Results</p> <p>With the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the <it>same a-priori information</it>, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites.</p> <p>Conclusions</p> <p>We find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different priors has been neglected. We implement the derived prior is implemented in the open-source library Jstacs to enable an easy application to comparative studies of different learning principles in the field of sequence analysis.</p
- …