54,834 research outputs found
An active learning approach for statistical spoken language understanding
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-25085-9_67In general, large amount of segmented and labeled data is needed to estimate statistical language understanding systems. In recent years, different approaches have been proposed to reduce the segmentation and labeling effort by means of unsupervised o semi-supervised learning techniques. We propose an active learning approach to the estimation of statistical language understanding models that involves the transcription, labeling and segmentation of a small amount of data, along with the use of raw data. We use this approach to learn the understanding component of a Spoken Dialog System. Some experiments that show the appropriateness of our approach are also presented.Work partially supported by the Spanish MICINN under contract TIN2008-06856-C05-02, and by the Vicerrectorat d’InvestigaciĂł, Desenvolupament i InnovaciĂł of the Universitat Politècnica de València under contract 20100982.GarcĂa Granada, F.; Hurtado Oliver, LF.; SanchĂs Arnal, E.; Segarra Soriano, E. (2011). An active learning approach for statistical spoken language understanding. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer Verlag (Germany). 7042:565-572. https://doi.org/10.1007/978-3-642-25085-9_67S5655727042De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: A survey. IEEE Signal Processing Magazine 25(3), 50–58 (2008)Fraser, M., Gilbert, G.: Simulating speech systems. Computer Speech and Language 5, 81–99 (1991)Gotab, P., Bechet, F., Damnati, G.: Active learning for rule-based and corpus-based spoken labguage understanding moldes. In: IEEE Workshop Automatic Speech Recognition and Understanding (ASRU 2009), pp. 444–449 (2009)Gotab, P., Damnati, G., Becher, F., Delphin-Poulat, L.: Online slu model adaptation with a partial oracle. In: Proc. of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 2862–2865 (2010)He, Y., Young, S.: Spoken language understanding using the hidden vector state model. Speech Communication 48, 262–275 (2006)Ortega, L., Galiano, I., Hurtado, L.F., Sanchis, E., Segarra, E.: A statistical segment-based approach for spoken language understanding. In: Proc. of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 1836–1839 (2010)Riccardi, G., Hakkani-Tur, D.: Active learning: theory and applications to automatic speech recognition. IEEE Transactions on Speech and Audio Processing 13(4), 504–511 (2005)Segarra, E., Sanchis, E., Galiano, M., GarcĂa, F., Hurtado, L.: Extracting Semantic Information Through Automatic Learning Techniques. International Journal of Pattern Recognition and Artificial Intelligence 16(3), 301–307 (2002)Tur, G., Hakkani-Tr, D., Schapire, R.E.: Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45, 171–186 (2005
A multilingual SLU system based on semantic decoding of graphs of words
In this paper, we present a statistical approach to Language
Understanding that allows to avoid the effort of obtaining new semantic
models when changing the language. This way, it is not necessary to acquire
and label new training corpora in the new language. Our approach
consists of learning all the semantic models in a target language and
to do the semantic decoding of the sentences pronounced in the source
language after a translation process. In order to deal with the errors and
the lack of coverage of the translations, a mechanism to generalize the
result of several translators is proposed. The graph of words generated
in this phase is the input to the semantic decoding algorithm specifically
designed to combine statistical models and graphs of words. Some experiments
that show the good behavior of the proposed approach are also
presented.Calvo Lance, M.; Hurtado Oliver, LF.; GarcĂa Granada, F.; SanchĂs Arnal, E. (2012). A multilingual SLU system based on semantic decoding of graphs of words. En Advances in Speech and Language Technologies for Iberian Languages. Springer Verlag (Germany). 328:158-167. doi:10.1007/978-3-642-35292-8_17S158167328Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing 6(99), 1569–1583 (2010)Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: Proceedings of Interspeech 2007, pp. 1605–1608 (2007)Tur, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, 1st edn. Wiley (2011)Maynard, H.B., Lefèvre, F.: Investigating Stochastic Speech Understanding. In: Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, ASRU (2001)Segarra, E., Sanchis, E., Galiano, M., GarcĂa, F., Hurtado, L.: Extracting Semantic Information Through Automatic Learning Techniques. IJPRAI 16(3), 301–307 (2002)He, Y., Young, S.: Spoken language understanding using the hidden vector state model. Speech Communication 48, 262–275 (2006)De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: A survey. IEEE Signal Processing Magazine 25(3), 50–58 (2008)Hakkani-TĂĽr, D., BĂ©chet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20(4), 495–514 (2006)Tur, G., Wright, J., Gorin, A., Riccardi, G., Hakkani-TĂĽr, D.: Improving spoken language understanding using word confusion networks. In: Proceedings of the ICSLP. Citeseer (2002)Tur, G., Hakkani-TĂĽr, D., Schapire, R.E.: Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45, 171–186 (2005)Ortega, L., Galiano, I., Hurtado, L.F., Sanchis, E., Segarra, E.: A statistical segment-based approach for spoken language understanding. In: Proc. of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 1836–1839 (2010)Sim, K.C., Byrne, W.J., Gales, M.J.F., Sahbi, H., Woodland, P.C.: Consensus network decoding for statistical machine translation system combination. In: IEEE Int. Conference on Acoustics, Speech, and Signal Processing (2007)Bangalore, S., Bordel, G., Riccardi, G.: Computing Consensus Translation from Multiple Machine Translation Systems. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2001, pp. 351–354 (2001)Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: ClustalW and ClustalX version 2.0. Bioinformatics 23(21), 2947–2948 (2007)BenedĂ, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., LĂłpez de Letona, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC 2006, Genoa, Italy, pp. 1636–1639 (May 2006
A Train-on-Target Strategy for Multilingual Spoken Language Understanding
[EN] There are two main strategies to adapt a Spoken Language
Understanding system to deal with languages different from the original
(source) language: test-on-source and train-on-target. In the train-ontarget
approach, a new understanding model is trained in the target language,
which is the language in which the test utterances are pronounced.
To do this, a segmented and semantically labeled training set for each
new language is needed. In this work, we use several general-purpose
translators to obtain the translation of the training set and we apply an
alignment process to automatically segment the training sentences. We
have applied this train-on-target approach to estimate the understanding
module of a Spoken Dialog System for the DIHANA task, which consists
of an information system about train timetables and fares in Spanish.
We present an evaluation of our train-on-target multilingual approach
for two target languages, French and EnglishThis work has been partially funded by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MEC TIN2014-54288-C4-3-R).GarcĂa-Granada, F.; Segarra Soriano, E.; Millán, C.; SanchĂs Arnal, E.; Hurtado Oliver, LF. (2016). A Train-on-Target Strategy for Multilingual Spoken Language Understanding. Lecture Notes in Computer Science. 10077:224-233. https://doi.org/10.1007/978-3-319-49169-1_22S22423310077BenedĂ, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., LĂłpez de Letona, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: LREC 2006, pp. 1636–1639 (2006)Calvo, M., Hurtado, L.-F., GarcĂa, F., SanchĂs, E.: A Multilingual SLU system based on semantic decoding of graphs of words. In: Torre Toledano, D., Ortega GimĂ©nez, A., Teixeira, A., González RodrĂguez, J., Hernández GĂłmez, L., San Segundo Hernández, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 158–167. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-35292-8_17Calvo, M., Hurtado, L.F., Garca, F., Sanchis, E., Segarra, E.: Multilingual spoken language understanding using graphs and multiple translations. Comput. Speech Lang. 38, 86–103 (2016)Dinarelli, M., Moschitti, A., Riccardi, G.: Concept segmentation and labeling for conversational speech. In: Interspeech, Brighton, UK (2009)Esteve, Y., Raymond, C., Bechet, F., Mori, R.D.: Conceptual decoding for spoken dialog systems. In: Proceedings of EuroSpeech 2003, pp. 617–620 (2003)GarcĂa, F., Hurtado, L., Segarra, E., Sanchis, E., Riccardi, G.: Combining multiple translation systems for spoken language understanding portability. In: Proceedings of IEEE Workshop on Spoken Language Technology (SLT), pp. 282–289 (2012)Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans. Audio Speech Lang. Process. 6(99), 1569–1583 (2010)He, Y., Young, S.: A data-driven spoken language understanding system. In: Proceedings of ASRU 2003, pp. 583–588 (2003)Hurtado, L., Segarra, E., GarcĂa, F., Sanchis, E.: Language understanding using n-multigram models. In: Vicedo, J.L., MartĂnez-Barco, P., MuĹ„oz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 207–219. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30228-5_19Jabaian, B., Besacier, L., Lefèvre, F.: Comparison and combination of lightly supervised approaches for language portability of a spoken language understanding system. IEEE Trans. Audio Speech Lang. Process. 21(3), 636–648 (2013)Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL Demonstration Session, pp. 177–180 (2007)Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289. Citeseer (2001)Lefèvre, F.: Dynamic Bayesian networks and discriminative classifiers for multi-stage semantic interpretation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. 13–16. IEEE (2007)Ortega, L., Galiano, I., Hurtado, L.F., Sanchis, E., Segarra, E.: A statistical segment-based approach for spoken language understanding. In: Proceedings of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 1836–1839 (2010)Segarra, E., Sanchis, E., Galiano, M., GarcĂa, F., Hurtado, L.: Extracting semantic information through automatic learning techniques. IJPRAI 16(3), 301–307 (2002)Servan, C., Camelin, N., Raymond, C., Bchet, F., Mori, R.D.: On the use of machine translation for spoken language understanding portability. In: Proceedings of ICASSP 2010, pp. 5330–5333 (2010)TĂĽr, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, 1st edn. Wiley, Hoboken (2011
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
- …