272,165 research outputs found
A Train-on-Target Strategy for Multilingual Spoken Language Understanding
[EN] There are two main strategies to adapt a Spoken Language
Understanding system to deal with languages different from the original
(source) language: test-on-source and train-on-target. In the train-ontarget
approach, a new understanding model is trained in the target language,
which is the language in which the test utterances are pronounced.
To do this, a segmented and semantically labeled training set for each
new language is needed. In this work, we use several general-purpose
translators to obtain the translation of the training set and we apply an
alignment process to automatically segment the training sentences. We
have applied this train-on-target approach to estimate the understanding
module of a Spoken Dialog System for the DIHANA task, which consists
of an information system about train timetables and fares in Spanish.
We present an evaluation of our train-on-target multilingual approach
for two target languages, French and EnglishThis work has been partially funded by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MEC TIN2014-54288-C4-3-R).GarcĂa-Granada, F.; Segarra Soriano, E.; MillĂĄn, C.; SanchĂs Arnal, E.; Hurtado Oliver, LF. (2016). A Train-on-Target Strategy for Multilingual Spoken Language Understanding. Lecture Notes in Computer Science. 10077:224-233. https://doi.org/10.1007/978-3-319-49169-1_22S22423310077BenedĂ, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., LĂłpez de Letona, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: LREC 2006, pp. 1636â1639 (2006)Calvo, M., Hurtado, L.-F., GarcĂa, F., SanchĂs, E.: A Multilingual SLU system based on semantic decoding of graphs of words. In: Torre Toledano, D., Ortega GimĂŠnez, A., Teixeira, A., GonzĂĄlez RodrĂguez, J., HernĂĄndez GĂłmez, L., San Segundo HernĂĄndez, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 158â167. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-35292-8_17Calvo, M., Hurtado, L.F., Garca, F., Sanchis, E., Segarra, E.: Multilingual spoken language understanding using graphs and multiple translations. Comput. Speech Lang. 38, 86â103 (2016)Dinarelli, M., Moschitti, A., Riccardi, G.: Concept segmentation and labeling for conversational speech. In: Interspeech, Brighton, UK (2009)Esteve, Y., Raymond, C., Bechet, F., Mori, R.D.: Conceptual decoding for spoken dialog systems. In: Proceedings of EuroSpeech 2003, pp. 617â620 (2003)GarcĂa, F., Hurtado, L., Segarra, E., Sanchis, E., Riccardi, G.: Combining multiple translation systems for spoken language understanding portability. In: Proceedings of IEEE Workshop on Spoken Language Technology (SLT), pp. 282â289 (2012)Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans. Audio Speech Lang. Process. 6(99), 1569â1583 (2010)He, Y., Young, S.: A data-driven spoken language understanding system. In: Proceedings of ASRU 2003, pp. 583â588 (2003)Hurtado, L., Segarra, E., GarcĂa, F., Sanchis, E.: Language understanding using n-multigram models. In: Vicedo, J.L., MartĂnez-Barco, P., MuĹoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 207â219. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30228-5_19Jabaian, B., Besacier, L., Lefèvre, F.: Comparison and combination of lightly supervised approaches for language portability of a spoken language understanding system. IEEE Trans. Audio Speech Lang. Process. 21(3), 636â648 (2013)Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL Demonstration Session, pp. 177â180 (2007)Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282â289. Citeseer (2001)Lefèvre, F.: Dynamic Bayesian networks and discriminative classifiers for multi-stage semantic interpretation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. 13â16. IEEE (2007)Ortega, L., Galiano, I., Hurtado, L.F., Sanchis, E., Segarra, E.: A statistical segment-based approach for spoken language understanding. In: Proceedings of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 1836â1839 (2010)Segarra, E., Sanchis, E., Galiano, M., GarcĂa, F., Hurtado, L.: Extracting semantic information through automatic learning techniques. IJPRAI 16(3), 301â307 (2002)Servan, C., Camelin, N., Raymond, C., Bchet, F., Mori, R.D.: On the use of machine translation for spoken language understanding portability. In: Proceedings of ICASSP 2010, pp. 5330â5333 (2010)TĂźr, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, 1st edn. Wiley, Hoboken (2011
A multilingual SLU system based on semantic decoding of graphs of words
In this paper, we present a statistical approach to Language
Understanding that allows to avoid the effort of obtaining new semantic
models when changing the language. This way, it is not necessary to acquire
and label new training corpora in the new language. Our approach
consists of learning all the semantic models in a target language and
to do the semantic decoding of the sentences pronounced in the source
language after a translation process. In order to deal with the errors and
the lack of coverage of the translations, a mechanism to generalize the
result of several translators is proposed. The graph of words generated
in this phase is the input to the semantic decoding algorithm specifically
designed to combine statistical models and graphs of words. Some experiments
that show the good behavior of the proposed approach are also
presented.Calvo Lance, M.; Hurtado Oliver, LF.; GarcĂa Granada, F.; SanchĂs Arnal, E. (2012). A multilingual SLU system based on semantic decoding of graphs of words. En Advances in Speech and Language Technologies for Iberian Languages. Springer Verlag (Germany). 328:158-167. doi:10.1007/978-3-642-35292-8_17S158167328Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing 6(99), 1569â1583 (2010)Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: Proceedings of Interspeech 2007, pp. 1605â1608 (2007)Tur, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, 1st edn. Wiley (2011)Maynard, H.B., Lefèvre, F.: Investigating Stochastic Speech Understanding. In: Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, ASRU (2001)Segarra, E., Sanchis, E., Galiano, M., GarcĂa, F., Hurtado, L.: Extracting Semantic Information Through Automatic Learning Techniques. IJPRAI 16(3), 301â307 (2002)He, Y., Young, S.: Spoken language understanding using the hidden vector state model. Speech Communication 48, 262â275 (2006)De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: A survey. IEEE Signal Processing Magazine 25(3), 50â58 (2008)Hakkani-TĂźr, D., BĂŠchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20(4), 495â514 (2006)Tur, G., Wright, J., Gorin, A., Riccardi, G., Hakkani-TĂźr, D.: Improving spoken language understanding using word confusion networks. In: Proceedings of the ICSLP. Citeseer (2002)Tur, G., Hakkani-TĂźr, D., Schapire, R.E.: Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45, 171â186 (2005)Ortega, L., Galiano, I., Hurtado, L.F., Sanchis, E., Segarra, E.: A statistical segment-based approach for spoken language understanding. In: Proc. of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 1836â1839 (2010)Sim, K.C., Byrne, W.J., Gales, M.J.F., Sahbi, H., Woodland, P.C.: Consensus network decoding for statistical machine translation system combination. In: IEEE Int. Conference on Acoustics, Speech, and Signal Processing (2007)Bangalore, S., Bordel, G., Riccardi, G.: Computing Consensus Translation from Multiple Machine Translation Systems. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2001, pp. 351â354 (2001)Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: ClustalW and ClustalX version 2.0. Bioinformatics 23(21), 2947â2948 (2007)BenedĂ, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., LĂłpez de Letona, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC 2006, Genoa, Italy, pp. 1636â1639 (May 2006
Combining Several ASR Outputs in a Graph-Based SLU System
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-25751-8_66In this paper, we present an approach to Spoken Language
Understanding (SLU) where we perform a combination of multiple
hypotheses from several Automatic Speech Recognizers (ASRs) in
order to reduce the impact of recognition errors in the SLU module. This
combination is performed using a Grammatical Inference algorithm that
provides a generalization of the input sentences by means of a weighted
graph of words. We have also developed a specific SLU algorithm that is
able to process these graphs of words according to a stochastic semantic
modelling.The results show that the combinations of several hypotheses
from the ASR module outperform the results obtained by taking just the
1-best transcriptionThis work is partially supported by the Spanish MEC under contract TIN2014-54288-C4-3-R and FPU Grant AP2010-4193.Calvo Lance, M.; Hurtado Oliver, LF.; GarcĂa-Granada, F.; SanchĂs Arnal, E. (2015). Combining Several ASR Outputs in a Graph-Based SLU System. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer. 551-558. https://doi.org/10.1007/978-3-319-25751-8_66S551558Bangalore, S., Bordel, G., Riccardi, G.: Computing consensus translation from multiple machine translation systems. In: ASRU, pp. 351â354 (2001)BenedĂ, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., de Letona, I.L., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: LREC, pp. 1636â1639 (2006)Bonneau-Maynard, H., Lefèvre, F.: Investigating stochastic speech understanding. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 260â263 (2001)Calvo, M., GarcĂa, F., Hurtado, L.F., JimĂŠnez, S., Sanchis, E.: Exploiting multiple hypotheses for multilingual spoken language understanding. In: CoNLL, pp. 193â201 (2013)Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347â354 (1997)Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing 6(99), 1569â1583 (2010)Hakkani-TĂźr, D., BĂŠchet, F., Riccardi, G., TĂźr, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20(4), 495â514 (2006)He, Y., Young, S.: Spoken language understanding using the hidden vector state model. Speech Communication 48, 262â275 (2006)Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: ClustalW and ClustalX version 2.0. Bioinformatics 23(21), 2947â2948 (2007)Segarra, E., Sanchis, E., Galiano, M., GarcĂa, F., Hurtado, L.: Extracting Semantic Information Through Automatic Learning Techniques. IJPRAI 16(3), 301â307 (2002)TĂźr, G., Deoras, A., Hakkani-TĂźr, D.: Semantic parsing using word confusion networks with conditional random fields. In: INTERSPEECH (2013
Exploiting multiple ASR outputs for a spoken language understanding task
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-01931-4_19In this paper, we present an approach to Spoken Language Understanding, where the input to the semantic decoding process is a composition of multiple hypotheses provided by the Automatic Speech Recognition module. This way, the semantic constraints can be applied not only to a unique hypothesis, but also to other hypotheses that could represent a better recognition of the utterance. To do this, we have developed an algorithm to combine multiple sentences into a weighted graph of words, which is the input to the semantic decoding process. It has also been necessary to develop a specific algorithm to process these graphs of words according to the statistical models that represent the semantics of the task. This approach has been evaluated in a SLU task in Spanish. Results, considering different configurations of ASR outputs, show the better behavior of the system when a combination of hypotheses is considered.This work is partially supported by the Spanish MICINN under contract TIN2011-28169-C05-01, and under FPU Grant AP2010-4193Calvo Lance, M.; GarcĂa Granada, F.; Hurtado Oliver, LF.; JimĂŠnez Serrano, S.; SanchĂs Arnal, E. (2013). Exploiting multiple ASR outputs for a spoken language understanding task. En Speech and Computer. Springer Verlag (Germany). 8113:138-145. https://doi.org/10.1007/978-3-319-01931-4_19S1381458113TĂźr, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, 1st edn. Wiley (2011)Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In: Proceedings of the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347â354. IEEE (1997)Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: ClustalW and ClustalX version 2.0. Bioinformatics 23, 2947â2948 (2007)Sim, K.C., Byrne, W.J., Gales, M.J.F., Sahbi, H., Woodland, P.C.: Consensus network decoding for statistical machine translation system combination. In: IEEE Int. Conference on Acoustics, Speech, and Signal Processing (2007)Bangalore, S., Bordel, G., Riccardi, G.: Computing Consensus Translation from Multiple Machine Translation Systems. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2001), pp. 351â354 (2001)Calvo, M., Hurtado, L.-F., GarcĂa, F., SanchĂs, E.: A Multilingual SLU System Based on Semantic Decoding of Graphs of Words. In: Torre Toledano, D., Ortega GimĂŠnez, A., Teixeira, A., GonzĂĄlez RodrĂguez, J., HernĂĄndez GĂłmez, L., San Segundo HernĂĄndez, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 158â167. Springer, Heidelberg (2012)Hakkani-TĂźr, D., BĂŠchet, F., Riccardi, G., TĂźr, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20, 495â514 (2006)BenedĂ, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., LĂłpez de Letona, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC 2006, Genoa, Italy, pp. 1636â1639 (2006
Combining multiple translation systems for spoken language understanding portability
[EN] We are interested in the problem of learning Spoken Language Understanding (SLU) models for multiple target languages. Learning such models requires annotated corpora, and porting to different languages would require corpora with parallel text translation and semantic annotations. In this paper we investigate how to learn a SLU model in a target language starting from no target text and no semantic annotation. Our proposed algorithm is based on the idea of exploiting the diversity (with regard to performance and coverage) of multiple translation systems to transfer statistically stable word-to-concept mappings in the case of the romance language pair, French and Spanish. Each translation system performs differently at the lexical level (wrt BLEU). The best translation system performances for the semantic task are gained from their combination at different stages of the portability methodology. We have evaluated the portability algorithms on the French MEDIA corpus, using French as the source language and Spanish as the target language.
The experiments show the effectiveness of the proposed methods with respect to the source language SLU baseline.This work is partially supported by the Spanish MICINN under contract TIN2011-28169-C05-01, and by the Vic. d'Investigacio of the UPV under contracts PAID-00-09 and PAID-06-10
The author work was partially funded by FP7 PORTDIAL project n.296170GarcĂa-Granada, F.; Hurtado Oliver, LF.; Segarra Soriano, E.; SanchĂs Arnal, E.; Riccardi, G. (2012). Combining multiple translation systems for spoken language understanding portability. IEEE. 194-198. https://doi.org/10.1109/SLT.2012.642422119419
Finstreder: simple and fast spoken language understanding with finite state transducers using modern speech-to-text models
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches
Exploiting the semantic web for unsupervised spoken language understanding.
ABSTRACT This paper proposes an unsupervised training approach for SLU systems that leverages the structured semantic knowledge graphs of the emerging Semantic Web. The approach creates natural language surface forms of entity-relation-entity portions of knowledge graphs using a combination of web search retrieval and syntax-based dependency parsing. The new forms are used to train an SLU system in an unsupervised manner. This paper tests the approach on the problem of intent detection, and shows that the unsupervised training procedure matches the performance of supervised training over operating points important for commercial applications. Index Terms-spoken language understanding, intent detection, structured knowledge-based search, semantic we
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
- âŚ