3,120 research outputs found
Filler model based confidence measures for spoken dialogue systems: a case study for Turkish
Because of the inadequate performance of speech recognition systems, an accurate confidence scoring mechanism should be employed to understand user requests correctly. To determine a confidence score for a hypothesis, certain confidence features are combined. The performance of filler-model based confidence features have been investigated. Five types of filler model networks were defined: triphone-network; phone-network; phone-class network; 5-state catch-all model; 3-state catch-all model. First, all models were evaluated in a Turkish speech recognition task in terms of their ability to tag correctly (recognition-error or correct) recognition hypotheses. The best performance was obtained from the triphone recognition network. Then, the performances of reliable combinations of these models were investigated and it was observed that certain combinations of filler models could significantly improve the accuracy of the confidence annotatio
On the use of high-level information in speaker and language recognition
Actas de las IV Jornadas de TecnologĂa del Habla (JTH 2006)Automatic Speaker Recognition systems have been largely dominated by acoustic-spectral based systems, relying in proper modelling of the short-term vocal tract of speakers. However, there is scientific and intuitive evidence that speaker specific
information is embedded in the speech signal in multiple short- and long-term characteristics. In this work, a multilevel speaker recognition system combining acoustic, phonotactic and prosodic subsystems is presented and assessed using NIST 2005 Speaker Recognition Evaluation data.
For language recognition systems, the NIST 2005 Language Recognition Evaluation was selected to measure performance of a high-level language recognition systems
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Speaker adaptation techniques provide a powerful solution to customise
automatic speech recognition (ASR) systems for individual users. Practical
application of unsupervised model-based speaker adaptation techniques to data
intensive end-to-end ASR systems is hindered by the scarcity of speaker-level
data and performance sensitivity to transcription errors. To address these
issues, a set of compact and data efficient speaker-dependent (SD) parameter
representations are used to facilitate both speaker adaptive training and
test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR
systems. The sensitivity to supervision quality is reduced using a confidence
score-based selection of the less erroneous subset of speaker-level adaptation
data. Two lightweight confidence score estimation modules are proposed to
produce more reliable confidence scores. The data sparsity issue, which is
exacerbated by data selection, is addressed by modelling the SD parameter
uncertainty using Bayesian learning. Experiments on the benchmark 300-hour
Switchboard and the 233-hour AMI datasets suggest that the proposed confidence
score-based adaptation schemes consistently outperformed the baseline
speaker-independent (SI) Conformer model and conventional non-Bayesian, point
estimate-based adaptation using no speaker data selection. Similar consistent
performance improvements were retained after external Transformer and LSTM
language model rescoring. In particular, on the 300-hour Switchboard corpus,
statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute
(9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer
on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER
reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also
obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin
Genomic Methods for Studying the Post-Translational Regulation of Transcription Factors
The spatiotemporal coordination of gene expression is a fundamental process in cellular biology. Gene expression is regulated, in large part, by sequence-specific transcription factors that bind to DNA regions in the proximity of each target gene. Transcription factor activity and specificity are, in turn, regulated post-translationally by protein-modifying enzymes. High-throughput methods exist to probe specific steps of this process, such as protein-protein and protein-DNA interactions, but few computational tools exist to integrate this information in a principled, model-oriented manner. In this work, I develop several computational tools for studying the functional implications of transcription factor modification. I establish the first publicly accessible database for known and predicted regulatory circuits that encompass modifying enzymes, transcription factors, and transcriptional targets. I also develop a model-based method for integrating heterogeneous genomic and proteomic data for the inference of modification-dependent transcriptional regulatory networks. The model-based method is thoroughly validated as a reliable and accurate computational genomic tool. Additionally, I propose and demonstrate fundamental improvements to computational proteomic methods for identifying modified protein forms. In summary, this work contributes critical methodological advances to the field of regulatory network inference
Survey on Evaluation Methods for Dialogue Systems
In this paper we survey the methods and concepts developed for the evaluation
of dialogue systems. Evaluation is a crucial part during the development
process. Often, dialogue systems are evaluated by means of human evaluations
and questionnaires. However, this tends to be very cost and time intensive.
Thus, much work has been put into finding methods, which allow to reduce the
involvement of human labour. In this survey, we present the main concepts and
methods. For this, we differentiate between the various classes of dialogue
systems (task-oriented dialogue systems, conversational dialogue systems, and
question-answering dialogue systems). We cover each class by introducing the
main technologies developed for the dialogue systems and then by presenting the
evaluation methods regarding this class
- …