221 research outputs found

    Automatic Construction of Discourse Corpora for Dialogue Translation

    Get PDF
    In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation

    Computational Argumentation for the Automatic Analysis of Argumentative Discourse and Human Persuasion

    Full text link
    Tesis por compendio[ES] La argumentación computacional es el área de investigación que estudia y analiza el uso de distintas técnicas y algoritmos que aproximan el razonamiento argumentativo humano desde un punto de vista computacional. En esta tesis doctoral se estudia el uso de distintas técnicas propuestas bajo el marco de la argumentación computacional para realizar un análisis automático del discurso argumentativo, y para desarrollar técnicas de persuasión computacional basadas en argumentos. Con estos objetivos, en primer lugar se presenta una completa revisión del estado del arte y se propone una clasificación de los trabajos existentes en el área de la argumentación computacional. Esta revisión nos permite contextualizar y entender la investigación previa de forma más clara desde la perspectiva humana del razonamiento argumentativo, así como identificar las principales limitaciones y futuras tendencias de la investigación realizada en argumentación computacional. En segundo lugar, con el objetivo de solucionar algunas de estas limitaciones, se ha creado y descrito un nuevo conjunto de datos que permite abordar nuevos retos y investigar problemas previamente inabordables (e.g., evaluación automática de debates orales). Conjuntamente con estos datos, se propone un nuevo sistema para la extracción automática de argumentos y se realiza el análisis comparativo de distintas técnicas para esta misma tarea. Además, se propone un nuevo algoritmo para la evaluación automática de debates argumentativos y se prueba con debates humanos reales. Finalmente, en tercer lugar se presentan una serie de estudios y propuestas para mejorar la capacidad persuasiva de sistemas de argumentación computacionales en la interacción con usuarios humanos. De esta forma, en esta tesis se presentan avances en cada una de las partes principales del proceso de argumentación computacional (i.e., extracción automática de argumentos, representación del conocimiento y razonamiento basados en argumentos, e interacción humano-computador basada en argumentos), así como se proponen algunos de los cimientos esenciales para el análisis automático completo de discursos argumentativos en lenguaje natural.[CA] L'argumentació computacional és l'àrea de recerca que estudia i analitza l'ús de distintes tècniques i algoritmes que aproximen el raonament argumentatiu humà des d'un punt de vista computacional. En aquesta tesi doctoral s'estudia l'ús de distintes tècniques proposades sota el marc de l'argumentació computacional per a realitzar una anàlisi automàtic del discurs argumentatiu, i per a desenvolupar tècniques de persuasió computacional basades en arguments. Amb aquestos objectius, en primer lloc es presenta una completa revisió de l'estat de l'art i es proposa una classificació dels treballs existents en l'àrea de l'argumentació computacional. Aquesta revisió permet contextualitzar i entendre la investigació previa de forma més clara des de la perspectiva humana del raonament argumentatiu, així com identificar les principals limitacions i futures tendències de la investigació realitzada en argumentació computacional. En segon lloc, amb l'objectiu de sol\cdotlucionar algunes d'aquestes limitacions, hem creat i descrit un nou conjunt de dades que ens permet abordar nous reptes i investigar problemes prèviament inabordables (e.g., avaluació automàtica de debats orals). Conjuntament amb aquestes dades, es proposa un nou sistema per a l'extracció d'arguments i es realitza l'anàlisi comparativa de distintes tècniques per a aquesta mateixa tasca. A més a més, es proposa un nou algoritme per a l'avaluació automàtica de debats argumentatius i es prova amb debats humans reals. Finalment, en tercer lloc es presenten una sèrie d'estudis i propostes per a millorar la capacitat persuasiva de sistemes d'argumentació computacionals en la interacció amb usuaris humans. D'aquesta forma, en aquesta tesi es presenten avanços en cada una de les parts principals del procés d'argumentació computacional (i.e., l'extracció automàtica d'arguments, la representació del coneixement i raonament basats en arguments, i la interacció humà-computador basada en arguments), així com es proposen alguns dels fonaments essencials per a l'anàlisi automàtica completa de discursos argumentatius en llenguatge natural.[EN] Computational argumentation is the area of research that studies and analyses the use of different techniques and algorithms that approximate human argumentative reasoning from a computational viewpoint. In this doctoral thesis we study the use of different techniques proposed under the framework of computational argumentation to perform an automatic analysis of argumentative discourse, and to develop argument-based computational persuasion techniques. With these objectives in mind, we first present a complete review of the state of the art and propose a classification of existing works in the area of computational argumentation. This review allows us to contextualise and understand the previous research more clearly from the human perspective of argumentative reasoning, and to identify the main limitations and future trends of the research done in computational argumentation. Secondly, to overcome some of these limitations, we create and describe a new corpus that allows us to address new challenges and investigate on previously unexplored problems (e.g., automatic evaluation of spoken debates). In conjunction with this data, a new system for argument mining is proposed and a comparative analysis of different techniques for this same task is carried out. In addition, we propose a new algorithm for the automatic evaluation of argumentative debates and we evaluate it with real human debates. Thirdly, a series of studies and proposals are presented to improve the persuasiveness of computational argumentation systems in the interaction with human users. In this way, this thesis presents advances in each of the main parts of the computational argumentation process (i.e., argument mining, argument-based knowledge representation and reasoning, and argument-based human-computer interaction), and proposes some of the essential foundations for the complete automatic analysis of natural language argumentative discourses.This thesis has been partially supported by the Generalitat Valenciana project PROME- TEO/2018/002 and by the Spanish Government projects TIN2017-89156-R and PID2020- 113416RB-I00.Ruiz Dolz, R. (2023). Computational Argumentation for the Automatic Analysis of Argumentative Discourse and Human Persuasion [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/194806Compendi

    A statistical simulation technique to develop and evaluate conversational agents

    Get PDF
    In this paper, we present a technique for developing user simulators which are able to interact and evaluate conversational agents. Our technique is based on a statistical model that is automatically learned from a dialog corpus. This model is used by the user simulator to provide the next answer taking into account the complete history of the interaction. The main objective of our proposal is not only to evaluate the conversational agent, but also to improve this agent by employing the simulated dialogs to learn a better dialog model. We have applied this technique to design and evaluate a conversational agent which provides academic information in a multi-agent system. The results of the evaluation show that the proposed user simulation methodology can be used not only to evaluate conversational agents but also to explore new enhanced dialog strategies, thereby allowing the conversational agent to reduce the time needed to complete the dialogs and automatically detect new valid paths to achieve each of the required objectives defined for the task.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC 2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485).Publicad

    Sarcasm Detection in a Disaster Context

    Full text link
    During natural disasters, people often use social media platforms such as Twitter to ask for help, to provide information about the disaster situation, or to express contempt about the unfolding event or public policies and guidelines. This contempt is in some cases expressed as sarcasm or irony. Understanding this form of speech in a disaster-centric context is essential to improving natural language understanding of disaster-related tweets. In this paper, we introduce HurricaneSARC, a dataset of 15,000 tweets annotated for intended sarcasm, and provide a comprehensive investigation of sarcasm detection using pre-trained language models. Our best model is able to obtain as much as 0.70 F1 on our dataset. We also demonstrate that the performance on HurricaneSARC can be improved by leveraging intermediate task transfer learning. We release our data and code at https://github.com/tsosea2/HurricaneSarc

    A multilingual neural coaching model with enhanced long-term dialogue structure

    Get PDF
    In this work we develop a fully data-driven conversational agent capable of carrying out motivational coach- ing sessions in Spanish, French, Norwegian, and English. Unlike the majority of coaching, and in general well-being related conversational agents that can be found in the literature, ours is not designed by hand- crafted rules. Instead, we directly model the coaching strategy of professionals with end users. To this end, we gather a set of virtual coaching sessions through a Wizard of Oz platform, and apply state of the art Natural Language Processing techniques. We employ a transfer learning approach, pretraining GPT2 neural language models and fine-tuning them on our corpus. However, since these only take as input a local dialogue history, a simple fine-tuning procedure is not capable of modeling the long-term dialogue strategies that appear in coaching sessions. To alleviate this issue, we first propose to learn dialogue phase and scenario embeddings in the fine-tuning stage. These indicate to the model at which part of the dialogue it is and which kind of coaching session it is carrying out. Second, we develop a global deep learning system which controls the long-term structure of the dialogue. We also show that this global module can be used to visualize and interpret the decisions taken by the the conversational agent, and that the learnt representations are comparable to dialogue acts. Automatic and human evaluation show that our proposals serve to improve the baseline models. Finally, interaction experiments with coaching experts indicate that the system is usable and gives rise to positive emotions in Spanish, French and English, while the results in Norwegian point out that there is still work to be done in fully data driven approaches with very low resource languages.This work has been partially funded by the Basque Government under grant PRE_2017_1_0357 and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 769872

    Adapting Language Models When Training on Privacy-Transformed Data

    Get PDF
    International audienceIn recent years, voice-controlled personal assistants have revolutionized the interaction with smart devices and mobileapplications. The collected data are then used by system providers to train language models (LMs). Each spoken message reveals personal information, hence removing private information from the input sentences is necessary. Our data sanitization process relies on recognizing and replacing named entities by other words from the same class. However, this may harm LM training because privacy-transformed data is unlikely to match the test distribution. This paper aims to fill the gap by focusing on the adaptation of LMs initially trained on privacy-transformed sentences using a small amount of original untransformed data. To do so, we combine class-based LMs, which provide an effective approach to overcome data sparsity in the context of n-gram LMs, and neural LMs, which handle longer contexts and can yield better predictions. Our experiments show that training an LM on privacy-transformed data result in a relative 11% word error rate (WER) increase compared to training on the original untransformed data, and adapting that model on a limited amount of original untransformed data leads to a relative 8% WER improvement over the model trained solely on privacy-transformed data

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Neural models of language use:Studies of language comprehension and production in context

    Get PDF
    Artificial neural network models of language are mostly known and appreciated today for providing a backbone for formidable AI technologies. This thesis takes a different perspective. Through a series of studies on language comprehension and production, it investigates whether artificial neural networks—beyond being useful in countless AI applications—can serve as accurate computational simulations of human language use, and thus as a new core methodology for the language sciences

    Towards a Neural Era in Dialogue Management for Collaboration: A Literature Survey

    Full text link
    Dialogue-based human-AI collaboration can revolutionize collaborative problem-solving, creative exploration, and social support. To realize this goal, the development of automated agents proficient in skills such as negotiating, following instructions, establishing common ground, and progressing shared tasks is essential. This survey begins by reviewing the evolution of dialogue management paradigms in collaborative dialogue systems, from traditional handcrafted and information-state based methods to AI planning-inspired approaches. It then shifts focus to contemporary data-driven dialogue management techniques, which seek to transfer deep learning successes from form-filling and open-domain settings to collaborative contexts. The paper proceeds to analyze a selected set of recent works that apply neural approaches to collaborative dialogue management, spotlighting prevailing trends in the field. This survey hopes to provide foundational background for future advancements in collaborative dialogue management, particularly as the dialogue systems community continues to embrace the potential of large language models
    corecore