Search CORE

16 research outputs found

Dialogue-Based Relation Extraction

Author: Cardie Claire
Sun Kai
Yu Dian
Yu Dong
Publication venue
Publication date: 01/01/2020
Field of study

We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue. We further offer DialogRE as a platform for studying cross-sentence RE as most facts span multiple sentences. We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks. Considering the timeliness of communication in a dialogue, we design a new metric to evaluate the performance of RE methods in a conversational setting and investigate the performance of several representative RE methods on DialogRE. Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings. DialogRE is available at https://dataset.org/dialogre/.Comment: To appear in ACL 202

arXiv.org e-Print Archive

Crossref

Multilingual Coreference Resolution in Multiparty Dialogue

Author: Van Durme Benjamin
Xia Patrick
Yarmohammadi Mahsa
Zheng Boyuan
Publication venue
Publication date: 02/08/2022
Field of study

Existing multiparty dialogue datasets for coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting

arXiv.org e-Print Archive

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

Efficient Neural Methods for Coreference Resolution

Author: Xia Patrick
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 30/01/2023
Field of study

Coreference resolution is a core task in natural language processing and in creating language technologies. Neural methods and models for automatically resolving references have emerged and developed over the last several years. This progress is largely marked by continuous improvements on a single dataset and metric. In this thesis, the assumptions that underlie these improvements are shown to be unrealistic for real-world use due to the computational and data tradeoffs made to achieve apparently high performance. The thesis outlines and proposes solutions to three issues. First, to address the growing memory requirements and restrictions on input document length, a novel, constant memory neural model for coreference resolution is proposed and shown to attain performance comparable to contemporary models. Second, to address the failure of these models to generalize across datasets, continued training is evaluated and shown to be successful for transferring coreference resolution models between domains and languages. Finally, to combat the gains obtained via the use of increasingly large pretrained language models, multitask model pruning can be applied to maintain a single (small) model for multiple datasets. These methods reduce the computational cost of running a model and the annotation cost of creating a model for any arbitrary dataset. As real-world applications continue to demand resolution of coreference, methods that reduce the technical cost of training new models and making predictions are greatly desired, which this thesis addresses

JScholarship

Recommended from our members

Understanding and Improving Language Models Through a Data-Centric Lens

Author: Albalak Alon
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Training data has played a major role in the rise of large deep learning models. In particular, the scale and diversity of training data has led to incredible new capabilities in large language models. However, despite the success of such models, a notable gap persists in understanding the important role that data plays in their performance, and how to use this understanding to further improve models. In this work, we advocate for, and demonstrate the effectiveness of, data-centric AI.In the first part of this dissertation, we aim to better understand language models through their data. First, we design a relation extraction system that outputs human-interpretable intermediate outputs, allowing us to better understand why the system makes its predictions. Next, we delve into the intricate relationship between data and models by studying zero-shot and few-shot transfer learning settings, giving us insights into the interactions that training data has on model performance across diverse tasks.Based on the lessons from the first part of this dissertation, we next aim to improve the data used to train models. We first demonstrate that data selection can be formulated as a multi-armed bandit problem, where the goal is to optimize a model's training data. We apply the multi-armed bandit formulation first to the few-shot fine-tuning setting, and then to language model pretraining, designing algorithms and rewards that are unique for each problem setting. Finally, we show that for cross-lingual question answering, data augmentation is a strong approach to improving the diversity of training data, leading to improved performance.Overall, this work aims to improve our understanding of how deep learning models work, using data as the viewpoint. Further, we take this understanding and use it to develop data-efficient and performant models. We conclude the dissertation with discussions of future research in data-centric AI and propose avenues for extending these concepts into new research directions

eScholarship - University of California

Representation learning for dialogue systems

Author: Serban Iulian Vlad
Publication venue
Publication date: 01/05/2019
Field of study

Cette thèse présente une série de mesures prises pour étudier l’apprentissage de représentations (par exemple, l’apprentissage profond) afin de mettre en place des systèmes de dialogue et des agents de conversation virtuels. La thèse est divisée en deux parties générales. La première partie de la thèse examine l’apprentissage des représentations pour les modèles de dialogue génératifs. Conditionnés sur une séquence de tours à partir d’un dialogue textuel, ces modèles ont la tâche de générer la prochaine réponse appropriée dans le dialogue. Cette partie de la thèse porte sur les modèles séquence-à-séquence, qui est une classe de réseaux de neurones profonds génératifs. Premièrement, nous proposons un modèle d’encodeur-décodeur récurrent hiérarchique ("Hierarchical Recurrent Encoder-Decoder"), qui est une extension du modèle séquence-à-séquence traditionnel incorporant la structure des tours de dialogue. Deuxièmement, nous proposons un modèle de réseau de neurones récurrents multi-résolution ("Multiresolution Recurrent Neural Network"), qui est un modèle empilé séquence-à-séquence avec une représentation stochastique intermédiaire (une "représentation grossière") capturant le contenu sémantique abstrait communiqué entre les locuteurs. Troisièmement, nous proposons le modèle d’encodeur-décodeur récurrent avec variables latentes ("Latent Variable Recurrent Encoder-Decoder"), qui suivent une distribution normale. Les variables latentes sont destinées à la modélisation de l’ambiguïté et l’incertitude qui apparaissent naturellement dans la communication humaine. Les trois modèles sont évalués et comparés sur deux tâches de génération de réponse de dialogue: une tâche de génération de réponses sur la plateforme Twitter et une tâche de génération de réponses de l’assistance technique ("Ubuntu technical response generation task"). La deuxième partie de la thèse étudie l’apprentissage de représentations pour un système de dialogue utilisant l’apprentissage par renforcement dans un contexte réel. Cette partie porte plus particulièrement sur le système "Milabot" construit par l’Institut québécois d’intelligence artificielle (Mila) pour le concours "Amazon Alexa Prize 2017". Le Milabot est un système capable de bavarder avec des humains sur des sujets populaires à la fois par la parole et par le texte. Le système consiste d’un ensemble de modèles de récupération et de génération en langage naturel, comprenant des modèles basés sur des références, des modèles de sac de mots et des variantes des modèles décrits ci-dessus. Cette partie de la thèse se concentre sur la tâche de sélection de réponse. À partir d’une séquence de tours de dialogues et d’un ensemble des réponses possibles, le système doit sélectionner une réponse appropriée à fournir à l’utilisateur. Une approche d’apprentissage par renforcement basée sur un modèle appelée "Bottleneck Simulator" est proposée pour sélectionner le candidat approprié pour la réponse. Le "Bottleneck Simulator" apprend un modèle approximatif de l’environnement en se basant sur les trajectoires de dialogue observées et le "crowdsourcing", tout en utilisant un état abstrait représentant la sémantique du discours. Le modèle d’environnement est ensuite utilisé pour apprendre une stratégie d’apprentissage du renforcement par le biais de simulations. La stratégie apprise a été évaluée et comparée à des approches concurrentes via des tests A / B avec des utilisateurs réel, où elle démontre d’excellente performance.This thesis presents a series of steps taken towards investigating representation learning (e.g. deep learning) for building dialogue systems and conversational agents. The thesis is split into two general parts. The first part of the thesis investigates representation learning for generative dialogue models. Conditioned on a sequence of turns from a text-based dialogue, these models are tasked with generating the next, appropriate response in the dialogue. This part of the thesis focuses on sequence-to-sequence models, a class of generative deep neural networks. First, we propose the Hierarchical Recurrent Encoder-Decoder model, which is an extension of the vanilla sequence-to sequence model incorporating the turn-taking structure of dialogues. Second, we propose the Multiresolution Recurrent Neural Network model, which is a stacked sequence-to-sequence model with an intermediate, stochastic representation (a "coarse representation") capturing the abstract semantic content communicated between the dialogue speakers. Third, we propose the Latent Variable Recurrent Encoder-Decoder model, which is a variant of the Hierarchical Recurrent Encoder-Decoder model with latent, stochastic normally-distributed variables. The latent, stochastic variables are intended for modelling the ambiguity and uncertainty occurring naturally in human language communication. The three models are evaluated and compared on two dialogue response generation tasks: a Twitter response generation task and the Ubuntu technical response generation task. The second part of the thesis investigates representation learning for a real-world reinforcement learning dialogue system. Specifically, this part focuses on the Milabot system built by the Quebec Artificial Intelligence Institute (Mila) for the Amazon Alexa Prize 2017 competition. Milabot is a system capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language retrieval and generation models, including template-based models, bag-of-words models, and variants of the models discussed in the first part of the thesis. This part of the thesis focuses on the response selection task. Given a sequence of turns from a dialogue and a set of candidate responses, the system must select an appropriate response to give the user. A model-based reinforcement learning approach, called the Bottleneck Simulator, is proposed for selecting the appropriate candidate response. The Bottleneck Simulator learns an approximate model of the environment based on observed dialogue trajectories and human crowdsourcing, while utilizing an abstract (bottleneck) state representing high-level discourse semantics. The learned environment model is then employed to learn a reinforcement learning policy through rollout simulations. The learned policy has been evaluated and compared to competing approaches through A/B testing with real-world users, where it was found to yield excellent performance

Dépôt Institutionnel Numérique

Representation and learning schemes for argument stance mining.

Author: Clos Jérémie
Publication venue
Publication date: 30/06/2019
Field of study

Argumentation is a key part of human interaction. Used introspectively, it searches for the truth, by laying down argument for and against positions. As a mediation tool, it can be used to search for compromise between multiple human agents. For this purpose, theories of argumentation have been in development since the Ancient Greeks in order to formalise the process and therefore remove the human imprecision from it. From this practice the process of argument mining has emerged. As human interaction has moved from the small scale of one-to-one (or few-to-few) debates to large scale discussions where tens of thousands of participants can express their opinion in real time, the importance of argument mining has grown while its feasibility in a manual annotation setting has diminished and relied mainly on a human-defined heuristics to process the data. This underlines the importance of a new generation of computational tools that can automate this process on a larger scale. In this thesis we study argument stance detection, one of the steps involved in the argument mining workflow. We demonstrate how we can use data of varying reliability in order to mine argument stance in social media data. We investigate a spectrum of techniques, from completely unsupervised classification of stance using a sentiment lexicon, automated computation of a regularised stance lexicon, automated computation of a lexicon with modifiers, and the use of a lexicon with modifiers as a temporal feature model for more complex classification algorithms. We find that the addition of contextual information enhances unsupervised stance classification, within reason, and that multi-strategy algorithms that combine multiple heuristics by ordering them from the precise to the general tend to outperform other approaches by a large margin. Focusing then on building a stance lexicon, we find that optimising such lexicons using an empirical risk minimisation framework allows us to regularise them to a higher degree than competing probabilistic techniques, which helps us learn better lexicons from noisy data. We also conclude that adding local context (neighbouring words) information during the learning phase of the lexicons tends to produce more accurate results at the cost of robustness, since part of the weights is distributed from the words with a class valence to the contextual words. Finally, when investigating the use of lexicons to build feature models for traditional machine learning techniques, simple lexicons (without context) seem to perform overall as well as more complex ones, and better than purely semantic representations. We also find that word-level feature models tend to outperform sentence and instance-level representations, but that they do not benefit as much from being augmented by lexicon knowledge.This research programme was carried out in collaboration with the University of Glasgow, Department of Computer Science

Open Access Institutional Repository at Robert Gordon University

Explainable Argument Mining

Author: Lawrence John
Publication venue
Publication date: 01/01/2021
Field of study

University of Dundee Online Publications