434 research outputs found

    A Train-on-Target Strategy for Multilingual Spoken Language Understanding

    Full text link
    [EN] There are two main strategies to adapt a Spoken Language Understanding system to deal with languages different from the original (source) language: test-on-source and train-on-target. In the train-ontarget approach, a new understanding model is trained in the target language, which is the language in which the test utterances are pronounced. To do this, a segmented and semantically labeled training set for each new language is needed. In this work, we use several general-purpose translators to obtain the translation of the training set and we apply an alignment process to automatically segment the training sentences. We have applied this train-on-target approach to estimate the understanding module of a Spoken Dialog System for the DIHANA task, which consists of an information system about train timetables and fares in Spanish. We present an evaluation of our train-on-target multilingual approach for two target languages, French and EnglishThis work has been partially funded by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MEC TIN2014-54288-C4-3-R).García-Granada, F.; Segarra Soriano, E.; Millán, C.; Sanchís Arnal, E.; Hurtado Oliver, LF. (2016). A Train-on-Target Strategy for Multilingual Spoken Language Understanding. Lecture Notes in Computer Science. 10077:224-233. https://doi.org/10.1007/978-3-319-49169-1_22S22423310077Benedí, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., López de Letona, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: LREC 2006, pp. 1636–1639 (2006)Calvo, M., Hurtado, L.-F., García, F., Sanchís, E.: A Multilingual SLU system based on semantic decoding of graphs of words. In: Torre Toledano, D., Ortega Giménez, A., Teixeira, A., González Rodríguez, J., Hernández Gómez, L., San Segundo Hernández, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 158–167. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-35292-8_17Calvo, M., Hurtado, L.F., Garca, F., Sanchis, E., Segarra, E.: Multilingual spoken language understanding using graphs and multiple translations. Comput. Speech Lang. 38, 86–103 (2016)Dinarelli, M., Moschitti, A., Riccardi, G.: Concept segmentation and labeling for conversational speech. In: Interspeech, Brighton, UK (2009)Esteve, Y., Raymond, C., Bechet, F., Mori, R.D.: Conceptual decoding for spoken dialog systems. In: Proceedings of EuroSpeech 2003, pp. 617–620 (2003)García, F., Hurtado, L., Segarra, E., Sanchis, E., Riccardi, G.: Combining multiple translation systems for spoken language understanding portability. In: Proceedings of IEEE Workshop on Spoken Language Technology (SLT), pp. 282–289 (2012)Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans. Audio Speech Lang. Process. 6(99), 1569–1583 (2010)He, Y., Young, S.: A data-driven spoken language understanding system. In: Proceedings of ASRU 2003, pp. 583–588 (2003)Hurtado, L., Segarra, E., García, F., Sanchis, E.: Language understanding using n-multigram models. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 207–219. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30228-5_19Jabaian, B., Besacier, L., Lefèvre, F.: Comparison and combination of lightly supervised approaches for language portability of a spoken language understanding system. IEEE Trans. Audio Speech Lang. Process. 21(3), 636–648 (2013)Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL Demonstration Session, pp. 177–180 (2007)Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289. Citeseer (2001)Lefèvre, F.: Dynamic Bayesian networks and discriminative classifiers for multi-stage semantic interpretation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. 13–16. IEEE (2007)Ortega, L., Galiano, I., Hurtado, L.F., Sanchis, E., Segarra, E.: A statistical segment-based approach for spoken language understanding. In: Proceedings of InterSpeech 2010, Makuhari, Chiba, Japan, pp. 1836–1839 (2010)Segarra, E., Sanchis, E., Galiano, M., García, F., Hurtado, L.: Extracting semantic information through automatic learning techniques. IJPRAI 16(3), 301–307 (2002)Servan, C., Camelin, N., Raymond, C., Bchet, F., Mori, R.D.: On the use of machine translation for spoken language understanding portability. In: Proceedings of ICASSP 2010, pp. 5330–5333 (2010)Tür, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, 1st edn. Wiley, Hoboken (2011

    DNN adaptation by automatic quality estimation of ASR hypotheses

    Full text link
    In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

    Current trends in multilingual speech processing

    Get PDF
    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin

    A Strategy for Multilingual Spoken Language Understanding Based on Graphs of Linguistic Units

    Full text link
    [EN] In this thesis, the problem of multilingual spoken language understanding is addressed using graphs to model and combine the different knowledge sources that take part in the understanding process. As a result of this work, a full multilingual spoken language understanding system has been developed, in which statistical models and graphs of linguistic units are used. One key feature of this system is its ability to combine and process multiple inputs provided by one or more sources such as speech recognizers or machine translators. A graph-based monolingual spoken language understanding system was developed as a starting point. The input to this system is a set of sentences that is provided by one or more speech recognition systems. First, these sentences are combined by means of a grammatical inference algorithm in order to build a graph of words. Next, the graph of words is processed to construct a graph of concepts by using a dynamic programming algorithm that identifies the lexical structures that represent the different concepts of the task. Finally, the graph of concepts is used to build the best sequence of concepts. The multilingual case happens when the user speaks a language different to the one natively supported by the system. In this thesis, a test-on-source approach was followed. This means that the input sentences are translated into the system's language, and then they are processed by the monolingual system. For this purpose, two speech translation systems were developed. The output of these speech translation systems are graphs of words that are then processed by the monolingual graph-based spoken language understanding system. Both in the monolingual case and in the multilingual case, the experimental results show that a combination of several inputs allows to improve the results obtained with a single input. In fact, this approach outperforms the current state of the art in many cases when several inputs are combined.[ES] En esta tesis se aborda el problema de la comprensión multilingüe del habla utilizando grafos para modelizar y combinar las diversas fuentes de conocimiento que intervienen en el proceso. Como resultado se ha desarrollado un sistema completo de comprensión multilingüe que utiliza modelos estadísticos y grafos de unidades lingüísticas. El punto fuerte de este sistema es su capacidad para combinar y procesar múltiples entradas proporcionadas por una o varias fuentes, como reconocedores de habla o traductores automáticos. Como punto de partida se desarrolló un sistema de comprensión multilingüe basado en grafos. La entrada a este sistema es un conjunto de frases obtenido a partir de uno o varios reconocedores de habla. En primer lugar, se aplica un algoritmo de inferencia gramatical que combina estas frases y obtiene un grafo de palabras. A continuación, se analiza el grafo de palabras mediante un algoritmo de programación dinámica que identifica las estructuras léxicas correspondientes a los distintos conceptos de la tarea, de forma que se construye un grafo de conceptos. Finalmente, se procesa el grafo de conceptos para encontrar la mejo secuencia de conceptos. El caso multilingüe ocurre cuando el usuario habla una lengua distinta a la original del sistema. En este trabajo se ha utilizado una estrategia test-on-source, en la cual las frases de entrada se traducen al lenguaje del sistema y éste las trata de forma monolingüe. Para ello se han propuesto dos sistemas de traducción del habla cuya salida son grafos de palabras, los cuales son procesados por el algoritmo de comprensión basado en grafos. Tanto en la configuración monolingüe como en la multilingüe los resultados muestran que la combinación de varias entradas permite mejorar los resultados obtenidos con una sola entrada. De hecho, esta aproximación consigue en muchos casos mejores resultados que el actual estado del arte cuando se utiliza una combinación de varias entradas.[CA] Aquesta tesi tracta el problema de la comprensió multilingüe de la parla utilitzant grafs per a modelitzar i combinar les diverses fonts de coneixement que intervenen en el procés. Com a resultat s'ha desenvolupat un sistema complet de comprensió multilingüe de la parla que utilitza models estadístics i grafs d'unitats lingüístiques. El punt fort d'aquest sistema és la seua capacitat per combinar i processar múltiples entrades proporcionades per una o diverses fonts, com reconeixedors de la parla o traductors automàtics. Com a punt de partida, es va desenvolupar un sistema de comprensió monolingüe basat en grafs. L'entrada d'aquest sistema és un conjunt de frases obtingut a partir d'un o més reconeixedors de la parla. En primer lloc, s'aplica un algorisme d'inferència gramatical que combina aquestes frases i obté un graf de paraules. A continuació, s'analitza el graf de paraules mitjançant un algorisme de programació dinàmica que identifica les estructures lèxiques corresponents als distints conceptes de la tasca, de forma que es construeix un graf de conceptes. Finalment, es processa aquest graf de conceptes per trobar la millor seqüència de conceptes. El cas multilingüe ocorre quan l'usuari parla una llengua diferent a l'original del sistema. En aquest treball s'ha utilitzat una estratègia test-on-source, en la qual les frases d'entrada es tradueixen a la llengua del sistema, i aquest les tracta de forma monolingüe. Per a fer-ho es proposen dos sistemes de traducció de la parla l'eixida dels quals són grafs de paraules. Aquests grafs són posteriorment processats per l'algorisme de comprensió basat en grafs. Tant per la configuració monolingüe com per la multilingüe els resultats mostren que la combinació de diverses entrades és capaç de millorar el resultats obtinguts utilitzant una sola entrada. De fet, aquesta aproximació aconsegueix en molts casos millors resultats que l'actual estat de l'art quan s'utilitza una combinació de diverses entrades.Calvo Lance, M. (2016). A Strategy for Multilingual Spoken Language Understanding Based on Graphs of Linguistic Units [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62407TESI

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Harvesting and summarizing user-generated content for advanced speech-based human-computer interaction

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-164).There have been many assistant applications on mobile devices, which could help people obtain rich Web content such as user-generated data (e.g., reviews, posts, blogs, and tweets). However, online communities and social networks are expanding rapidly and it is impossible for people to browse and digest all the information via simple search interface. To help users obtain information more efficiently, both the interface for data access and the information representation need to be improved. An intuitive and personalized interface, such as a dialogue system, could be an ideal assistant, which engages a user in a continuous dialogue to garner the user's interest and capture the user's intent, and assists the user via speech-navigated interactions. In addition, there is a great need for a type of application that can harvest data from the Web, summarize the information in a concise manner, and present it in an aggregated yet natural way such as direct human dialogue. This thesis, therefore, aims to conduct research on a universal framework for developing speech-based interface that can aggregate user-generated Web content and present the summarized information via speech-based human-computer interaction. To accomplish this goal, several challenges must be met. Firstly, how to interpret users' intention from their spoken input correctly? Secondly, how to interpret the semantics and sentiment of user-generated data and aggregate them into structured yet concise summaries? Lastly, how to develop a dialogue modeling mechanism to handle discourse and present the highlighted information via natural language? This thesis explores plausible approaches to tackle these challenges. We will explore a lexicon modeling approach for semantic tagging to improve spoken language understanding and query interpretation. We will investigate a parse-and-paraphrase paradigm and a sentiment scoring mechanism for information extraction from unstructured user-generated data. We will also explore sentiment-involved dialogue modeling and corpus-based language generation approaches for dialogue and discourse. Multilingual prototype systems in multiple domains have been implemented for demonstration.by Jingjing Liu.Ph.D

    Utilisation des réseaux de neurones récurrents pour la projection interlingue d'étiquettes morpho-syntaxiques à partir d'un corpus parallèle

    Get PDF
    International audienceIn this paper, we propose a method to automatically induce linguistic analysis tools for languages that have no labeled training data. This method is based on cross-language projection of linguistic annotations from parallel corpora. Our method does not assume any knowledge about foreign languages, making it applicable to a wide range of resource-poor languages. No word alignment information is needed in our approach. We use Recurrent Neural Networks (RNNs) as cross-lingual analysis tool. To illustrate the potential of our approach, we firstly investigate Part-Of-Speech (POS) tagging. Combined with a simple projection method (using word alignment information), it achieves performance comparable to the one of recently published approaches for cross-lingual projection. Mots-clés : Multilinguisme, transfert crosslingue, étiquetage morpho-syntaxique, réseaux de neurones récurrents

    Developing Deployable Spoken Language Translation Systems given Limited Resources

    Get PDF
    Approaches are presented that support the deployment of spoken language translation systems. Newly developed methods allow low cost portability to new language pairs. Proposed translation model pruning techniques achieve a high translation performance even in low memory situations. The named entity and specialty vocabulary coverage, particularly on small and mobile devices, is targeted to an individual user by translation model personalization

    Practical Natural Language Processing for Low-Resource Languages.

    Full text link
    As the Internet and World Wide Web have continued to gain widespread adoption, the linguistic diversity represented has also been growing. Simultaneously the field of Linguistics is facing a crisis of the opposite sort. Languages are becoming extinct faster than ever before and linguists now estimate that the world could lose more than half of its linguistic diversity by the year 2100. This is a special time for Computational Linguistics; this field has unprecedented access to a great number of low-resource languages, readily available to be studied, but needs to act quickly before political, social, and economic pressures cause these languages to disappear from the Web. Most work in Computational Linguistics and Natural Language Processing (NLP) focuses on English or other languages that have text corpora of hundreds of millions of words. In this work, we present methods for automatically building NLP tools for low-resource languages with minimal need for human annotation in these languages. We start first with language identification, specifically focusing on word-level language identification, an understudied variant that is necessary for processing Web text and develop highly accurate machine learning methods for this problem. From there we move onto the problems of part-of-speech tagging and dependency parsing. With both of these problems we extend the current state of the art in projected learning to make use of multiple high-resource source languages instead of just a single language. In both tasks, we are able to improve on the best current methods. All of these tools are practically realized in the "Minority Language Server," an online tool that brings these techniques together with low-resource language text on the Web. The Minority Language Server, starting with only a few words in a language can automatically collect text in a language, identify its language and tag its parts of speech. We hope that this system is able to provide a convincing proof of concept for the automatic collection and processing of low-resource language text from the Web, and one that can hopefully be realized before it is too late.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113373/1/benking_1.pd
    • …
    corecore