21 research outputs found

    Improving Unsegmented Dialogue Turns Annotation with N-gram Transducers

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Active learning for dialogue act labelling

    Full text link
    Active learning is a useful technique that allows for a considerably reduction of the amount of data we need to manually label in order to reach a good performance of a statistical model. In order to apply active learning to a particular task we need to previously define an effective selection criteria, that picks out the most informative samples at each iteration of active learning process. This is still an open problem that we are going to face in this work, in the task of dialogue annotation at dialogue act level. We present two different criteria, weighted number of hypothesis and entropy, that we have applied to the Sample Selection Algorithm for the task of dialogue act labelling, that retrieved appreciably improvements in our experimental approach. © 2011 Springer-Verlag.Work supported by the EC (FEDER/FSE) and the Spanish MEC/MICINN under the MIPRCV “Consolider Ingenio 2010” program (CSD2007-00018), MITTRAL (TIN2009-14633-C03-01) projects and the FPI scholarship (BES-2009-028965). Also supported by the Generalitat Valenciana under grant Prometeo/2009/014 and GV/2010/067Ghigi, F.; Tamarit Ballester, V.; Martínez-Hinarejos, C.; Benedí Ruiz, JM. (2011). Active learning for dialogue act labelling. En Lecture Notes in Computer Science. Springer Verlag (Germany). 6669:652-659. https://doi.org/10.1007/978-3-642-21257-4_81S6526596669Alcácer, N., Benedí, J.M., Blat, F., Granell, R., Martínez, C.D., Torres, F.: Acquisition and Labelling of a Spontaneous Speech Dialogue Corpus. In: SPECOM, Greece, pp. 583–586 (2005)Benedí, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., López, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in spanish: DIHANA. In: Fifth LREC, Genova, Italy, pp. 1636–1639 (2006)Bunt, H.: Context and dialogue control. THINK Quarterly 3 (1994)Casacuberta, F., Vidal, E., Picó, D.: Inference of finite-state transducers from regular languages. Pat. Recognition 38(9), 1431–1443 (2005)Dybkjær, L., Minker, W. (eds.): Recent Trends in Discourse and Dialogue. Text, Speech and Language Technology, vol. 39. Springer, Dordrecht (2008)Gorin, A., Riccardi, G., Wright, J.: How may I help you? Speech Comm. 23, 113–127 (1997)Hwa, R.: Sample selection for statistical grammar induction. In: Proceedings of the 2000 Joint SIGDAT, pp. 45–52. Association for Computational Linguistics, Morristown (2000)Lavie, A., Levin, L., Zhan, P., Taboada, M., Gates, D., Lapata, M.M., Clark, C., Broadhead, M., Waibel, A.: Expanding the domain of a multi-lingual speech-to-speech translation system. In: Proceedings of the Workshop on Spoken Language Translation, ACL/EACL 1997 (1997)Martínez-Hinarejos, C.D., Tamarit, V., Benedí, J.M.: Improving unsegmented dialogue turns annotation with N-gram transducers. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC23), vol. 1, pp. 345–354 (2009)Robinson, D.W.: Entropy and uncertainty, vol. 10, pp. 493–506 (2008)Stolcke, A., Coccaro, N., Bates, R., Taylor, P., van Ess-Dykema, C., Ries, K., Shriberg, E., Jurafsky, D., Martin, R., Meteer, M.: Dialogue act modelling for automatic tagging and recognition of conversational speech. Computational Linguistics 26(3), 1–34 (2000)Tamarit, V., Benedí, J., Martínez-Hinarejos, C.: Estimating the number of segments for improving dialogue act labelling. In: Proceedings of the First International Workshop of Spoken Dialog Systems Technology (2009)Young, S.: Probabilistic methods in spoken dialogue systems. Philosophical Trans. Royal Society (Series A) 358(1769), 1389–1402 (2000

    Hierarchical Multi-Label Dialog Act Recognition on Spanish Data

    Get PDF
    Dialog acts reveal the intention behind the uttered words. Thus, their automatic recognition is important for a dialog system trying to understand its conversational partner. The study presented in this article approaches that task on the DIHANA corpus, whose three-level dialog act annotation scheme poses problems which have not been explored in recent studies. In addition to the hierarchical problem, the two lower levels pose multi-label classification problems. Furthermore, each level in the hierarchy refers to a different aspect concerning the intention of the speaker both in terms of the structure of the dialog and the task. Also, since its dialogs are in Spanish, it allows us to assess whether the state-of-the-art approaches on English data generalize to a different language. More specifically, we compare the performance of different segment representation approaches focusing on both sequences and patterns of words and assess the importance of the dialog history and the relations between the multiple levels of the hierarchy. Concerning the single-label classification problem posed by the top level, we show that the conclusions drawn on English data also hold on Spanish data. Furthermore, we show that the approaches can be adapted to multi-label scenarios. Finally, by hierarchically combining the best classifiers for each level, we achieve the best results reported for this corpus.Comment: 21 pages, 4 figures, 17 tables, translated version of the article published in Linguam\'atica 11(1

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF

    GREC: Multi-domain Speech Recognition for the Greek Language

    Get PDF
    Μία από τις κορυφαίες προκλήσεις στην Αυτόματη Αναγνώριση Ομιλίας είναι η ανάπτυξη ικανών συστημάτων που μπορούν να έχουν ισχυρή απόδοση μέσα από διαφορετικές συνθήκες ηχογράφησης. Στο παρόν έργο κατασκευάζουμε και αναλύουμε το GREC, μία μεγάλη πολυτομεακή συλλογή δεδομένων για αυτόματη αναγνώριση ομιλίας στην ελληνική γλώσσα. Το GREC αποτελείται από τρεις βάσεις δεδομένων στους θεματικούς τομείς των «εκπομπών ειδήσεων», «ομιλίας από δωρισμένες εγγραφές φωνής», «ηχητικών βιβλίων» και μιας νέας συλλογής δεδομένων στον τομέα των «πολιτικών ομιλιών». Για τη δημιουργία του τελευταίου, συγκεντρώνουμε δεδομένα ομιλίας από ηχογραφήσεις των επίσημων συνεδριάσεων της Βουλής των Ελλήνων, αποδίδοντας ένα σύνολο δεδομένων που αποτελείται από 120 ώρες ομιλίας πολιτικού περιεχομένου. Περιγράφουμε με λεπτομέρεια την καινούρια συλλογή δεδομένων, την προεπεξεργασία και την ευθυγράμμιση ομιλίας, τα οποία βασίζονται στο εργαλείο ανοιχτού λογισμικού Kaldi. Επιπλέον, αξιολογούμε την απόδοση των μοντέλων Gaussian Mixture (GMM) - Hidden Markov (HMM) και Deep Neural Network (DNN) - HMM όταν εφαρμόζονται σε δεδομένα από διαφορετικούς τομείς. Τέλος, προσθέτουμε τη δυνατότητα αυτόματης δεικτοδότησης ομιλητών στο Kaldi-gRPC-Server, ενός εργαλείου γραμμένο σε Python που βασίζεται στο PyKaldi και στο gRPC για βελτιωμένη ανάπτυξη μοντέλων αυτόματης αναγνώρισης ομιλίας.One of the leading challenges in Automatic Speech Recognition (ASR) is the development of robust systems that can perform well under multiple settings. In this work we construct and analyze GREC, a large, multi-domain corpus for automatic speech recognition for the Greek language. GREC is a collection of three available subcorpora over the domains of “news casts”, “crowd-sourced speech”, “audiobooks”, and a new corpus in the domain of “public speeches”. For the creation of the latter, HParl, we collect speech data from recordings of the official proceedings of the Hellenic Parliament, yielding, a dataset which consists of 120 hours of political speech segments. We describe our data collection, pre-processing and alignment setup, which are based on Kaldi toolkit. Furthermore, we perform extensive ablations on the recognition performance of Gaussian Mixture (GMM) - Hidden Markov (HMM) models and Deep Neural Network (DNN) - HMM models over the different domains. Finally, we integrate speaker diarization features to Kaldi-gRPC-Server, a modern, pythonic tool based on PyKaldi and gRPC for streamlined deployment of Kaldi based speech recognition

    Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

    Get PDF
    Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.This work has been partially supported by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231). AE was supported by BAGEP 2021 Award of the Science Academy. EE was supported in part by TUBA GEBIP 2018 Award. BP is in in part funded by Independent Research Fund Denmark (DFF) grant 9063-00077B. IC has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 838188. EL is partly funded by Generalitat Valenciana and the Spanish Government throught projects PROMETEU/2018/089 and RTI2018-094649-B-I00, respectively. SMI is partly funded by UNIRI project uniri-drustv-18-20. GB is partly supported by the Ministry of Innovation and the National Research, Development and Innovation Office within the framework of the Hungarian Artificial Intelligence National Laboratory Programme. COT is partially funded by the Romanian Ministry of European Investments and Projects through the Competitiveness Operational Program (POC) project “HOLOTRAIN” (grant no. 29/221 ap2/07.04.2020, SMIS code: 129077) and by the German Academic Exchange Service (DAAD) through the project “AWAKEN: content-Aware and netWork-Aware faKE News mitigation” (grant no. 91809005). ESA is partially funded by the German Academic Exchange Service (DAAD) through the project “Deep-Learning Anomaly Detection for Human and Automated Users Behavior” (grant no. 91809358)
    corecore