3,460 research outputs found
A multilingual SLU system based on semantic decoding of graphs of words
In this paper, we present a statistical approach to Language
Understanding that allows to avoid the effort of obtaining new semantic
models when changing the language. This way, it is not necessary to acquire
and label new training corpora in the new language. Our approach
consists of learning all the semantic models in a target language and
to do the semantic decoding of the sentences pronounced in the source
language after a translation process. In order to deal with the errors and
the lack of coverage of the translations, a mechanism to generalize the
result of several translators is proposed. The graph of words generated
in this phase is the input to the semantic decoding algorithm specifically
designed to combine statistical models and graphs of words. Some experiments
that show the good behavior of the proposed approach are also
presented.Calvo Lance, M.; Hurtado Oliver, LF.; GarcĂa Granada, F.; SanchĂs Arnal, E. (2012). A multilingual SLU system based on semantic decoding of graphs of words. En Advances in Speech and Language Technologies for Iberian Languages. Springer Verlag (Germany). 328:158-167. doi:10.1007/978-3-642-35292-8_17S158167328Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing 6(99), 1569â1583 (2010)Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: Proceedings of Interspeech 2007, pp. 1605â1608 (2007)Tur, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, 1st edn. Wiley (2011)Maynard, H.B., Lefèvre, F.: Investigating Stochastic Speech Understanding. In: Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, ASRU (2001)Segarra, E., Sanchis, E., Galiano, M., GarcĂa, F., Hurtado, L.: Extracting Semantic Information Through Automatic Learning Techniques. IJPRAI 16(3), 301â307 (2002)He, Y., Young, S.: Spoken language understanding using the hidden vector state model. Speech Communication 48, 262â275 (2006)De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: A survey. IEEE Signal Processing Magazine 25(3), 50â58 (2008)Hakkani-TĂźr, D., BĂŠchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20(4), 495â514 (2006)Tur, G., Wright, J., Gorin, A., Riccardi, G., Hakkani-TĂźr, D.: Improving spoken language understanding using word confusion networks. In: Proceedings of the ICSLP. Citeseer (2002)Tur, G., Hakkani-TĂźr, D., Schapire, R.E.: Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45, 171â186 (2005)Ortega, L., Galiano, I., Hurtado, L.F., Sanchis, E., Segarra, E.: A statistical segment-based approach for spoken language understanding. In: Proc. of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 1836â1839 (2010)Sim, K.C., Byrne, W.J., Gales, M.J.F., Sahbi, H., Woodland, P.C.: Consensus network decoding for statistical machine translation system combination. In: IEEE Int. Conference on Acoustics, Speech, and Signal Processing (2007)Bangalore, S., Bordel, G., Riccardi, G.: Computing Consensus Translation from Multiple Machine Translation Systems. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2001, pp. 351â354 (2001)Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: ClustalW and ClustalX version 2.0. Bioinformatics 23(21), 2947â2948 (2007)BenedĂ, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., LĂłpez de Letona, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC 2006, Genoa, Italy, pp. 1636â1639 (May 2006
Spoken Language Intent Detection using Confusion2Vec
Decoding speaker's intent is a crucial part of spoken language understanding
(SLU). The presence of noise or errors in the text transcriptions, in real life
scenarios make the task more challenging. In this paper, we address the spoken
language intent detection under noisy conditions imposed by automatic speech
recognition (ASR) systems. We propose to employ confusion2vec word feature
representation to compensate for the errors made by ASR and to increase the
robustness of the SLU system. The confusion2vec, motivated from human speech
production and perception, models acoustic relationships between words in
addition to the semantic and syntactic relations of words in human language. We
hypothesize that ASR often makes errors relating to acoustically similar words,
and the confusion2vec with inherent model of acoustic relationships between
words is able to compensate for the errors. We demonstrate through experiments
on the ATIS benchmark dataset, the robustness of the proposed model to achieve
state-of-the-art results under noisy ASR conditions. Our system reduces
classification error rate (CER) by 20.84% and improves robustness by 37.48%
(lower CER degradation) relative to the previous state-of-the-art going from
clean to noisy transcripts. Improvements are also demonstrated when training
the intent detection models on noisy transcripts
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
- âŚ