11 research outputs found

    Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification

    Full text link
    Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition. The central ideas of CTC include adding a label "blank" during training. With this mechanism, CTC eliminates the need of segment alignment, and hence has been applied to various sequence-to-sequence learning problems. In this work, we applied CTC to abstractive summarization for spoken content. The "blank" in this case implies the corresponding input data are less important or noisy; thus it can be ignored. This approach was shown to outperform the existing methods in term of ROUGE scores over Chinese Gigaword and MATBN corpora. This approach also has the nice property that the ordering of words or characters in the input documents can be better preserved in the generated summaries.Comment: Accepted by Interspeech 201

    Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

    Get PDF
    Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.This work has been partially supported by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231). AE was supported by BAGEP 2021 Award of the Science Academy. EE was supported in part by TUBA GEBIP 2018 Award. BP is in in part funded by Independent Research Fund Denmark (DFF) grant 9063-00077B. IC has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 838188. EL is partly funded by Generalitat Valenciana and the Spanish Government throught projects PROMETEU/2018/089 and RTI2018-094649-B-I00, respectively. SMI is partly funded by UNIRI project uniri-drustv-18-20. GB is partly supported by the Ministry of Innovation and the National Research, Development and Innovation Office within the framework of the Hungarian Artificial Intelligence National Laboratory Programme. COT is partially funded by the Romanian Ministry of European Investments and Projects through the Competitiveness Operational Program (POC) project “HOLOTRAIN” (grant no. 29/221 ap2/07.04.2020, SMIS code: 129077) and by the German Academic Exchange Service (DAAD) through the project “AWAKEN: content-Aware and netWork-Aware faKE News mitigation” (grant no. 91809005). ESA is partially funded by the German Academic Exchange Service (DAAD) through the project “Deep-Learning Anomaly Detection for Human and Automated Users Behavior” (grant no. 91809358)

    Tackling Sequence to Sequence Mapping Problems with Neural Networks

    Full text link
    In Natural Language Processing (NLP), it is important to detect the relationship between two sequences or to generate a sequence of tokens given another observed sequence. We call the type of problems on modelling sequence pairs as sequence to sequence (seq2seq) mapping problems. A lot of research has been devoted to finding ways of tackling these problems, with traditional approaches relying on a combination of hand-crafted features, alignment models, segmentation heuristics, and external linguistic resources. Although great progress has been made, these traditional approaches suffer from various drawbacks, such as complicated pipeline, laborious feature engineering, and the difficulty for domain adaptation. Recently, neural networks emerged as a promising solution to many problems in NLP, speech recognition, and computer vision. Neural models are powerful because they can be trained end to end, generalise well to unseen examples, and the same framework can be easily adapted to a new domain. The aim of this thesis is to advance the state-of-the-art in seq2seq mapping problems with neural networks. We explore solutions from three major aspects: investigating neural models for representing sequences, modelling interactions between sequences, and using unpaired data to boost the performance of neural models. For each aspect, we propose novel models and evaluate their efficacy on various tasks of seq2seq mapping.Comment: PhD thesi

    Just-in-time information retrieval and summarization for personal assistance

    Get PDF
    With the rapid development of means for producing user-generated data opportunities for collecting such data over a time-line and utilizing it for various human-aid applications are more than ever. Wearable and mobile data capture devices as well as many online data channels such as search engines are all examples of means of user data collection. Such user data could be utilized to model user behavior, identify relevant information to a user and retrieve it in a timely fashion for personal assistance. User data can include recordings of one's conversations, images, biophysical data, health-related data captured by wearable devices, interactions with smartphones and computers, and more. In order to utilize such data for personal assistance, summaries of previously recorded events can be presented to a user in order to augment the user's memory, send notifications about important events to the user, predict the user's near-future information needs and retrieve relevant content even before the user asks. In this PhD dissertation, we design a personal assistant with a focus on two main aspects: The first aspect is that a personal assistant should be able to summarize user data and present it to a user. To achieve this goal, we build a Social Interactions Log Analysis System (SILAS) that summarizes a person's conversations into event snippets consisting of spoken topics paired with images and other modalities of data captured by the person's wearable devices. Furthermore, we design a novel discrete Dynamic Topic Model (dDTM) capable of tracking the evolution of the intermittent spoken topics over time. Additionally, we present the first neural Customizable Abstractive Topic-based Summarization (CATS) model that produces summaries of textual documents including meeting transcripts in the form of natural language. The second aspect that a personal assistant should be capable of, is proactively addressing the user's information needs. For this purpose, we propose a family of just-in-time information retrieval models such as an evolutionary model named Kalman combination of Recency and Establishment (K2RE) that can anticipate a user's near-future information needs. Such information needs can include information for preparing a future meeting or near-future search queries of a user

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    Designing coherent and engaging open-domain conversational AI systems

    Get PDF
    Designing conversational AI systems able to engage in open-domain ‘social’ conversation is extremely challenging and a frontier of current research. Such systems are required to have extensive awareness of the dialogue context and world knowledge, the user intents and interests, requiring more complicated language understanding, dialogue management, and state and topic tracking mechanisms compared to traditional task-oriented dialogue systems. Given the wide coverage of topics in open-domain dialogue, the conversation can span multiple turns where a number of complex linguistic phenomena (e.g. ellipsis and anaphora) are present and should be resolved for the system to be contextually aware. Such systems also need to be engaging, keeping the users’ interest over long conversations. These are only some of the challenges that open-domain dialogue systems face. Therefore this thesis focuses on designing dialogue systems able to hold extensive open-domain conversations in a coherent, engaging, and appropriate manner over multiple turns. First, different types of dialogue systems architecture and design decisions are discussed for social open-domain conversations, along with relevant evaluation metrics. A modular architecture for ensemble-based conversational systems is presented, called Alana, a finalist in the Amazon Alexa Prize Challenge in 2017 and 2018, able to tackle many of the challenges for open-domain social conversation. The system combines different features such as topic tracking, contextual Natural Language understanding, entity linking, user modelling, information retrieval, and response ranking, using a rich representation of dialogue state. The thesis next analyses the performance of the 2017 system and describes the upgrades developed for the 2018 system. This leads to an analysis and comparison of the real-user data collected in both years with different system configurations, allowing assessment of the impact of different design decisions and modules. Finally, Alana was integrated into an embodied robotic platform and enhanced with the ability to also perform tasks. This system was deployed and evaluated in a shopping mall in Finland. Further analysis of the added embodiment is presented and discussed, as well as the challenges of translating open-domain dialogue systems into other languages. Data analysis of the collected real-user data shows the importance of a variety of features developed and decisions made in the design of the Alana system
    corecore