Search CORE

89 research outputs found

Understanding Events:A Diversity-driven Human-Machine Approach

Author: Inel Oana
Publication venue
Publication date: 09/03/2022
Field of study

VU Research Portal

Argument Mining:A Survey

Author: Abbott Rob
Adam Wyner
Ailomaa Marita
Anand Pranav
Aristotle
Aristotle
Athar Awais
Bex Floris
Bex Floris
Bosc Tom
Budzynska Katarzyna
Budzynska Katarzyna
Bunt Harry
Cabrio Elena
Carletta Jean
Carstens Lucas
Chris Reed
Cialdini Robert B.
Das Sanjiv
Delmonte Roldolfo
Dubremetz Marie
Duthie Rory
Egawa Ryo
Fahnestock J.
Feng Vanessa Wei
Feng Vanessa Wei
Gawryjolek Jakub
Grennan Wayne
Groarke Leo
Grosse Kathrin
Grosz Barbara J.
Hamblin C. L.
Hidey Christopher
Hirschberg Julia
Hobbs Jerry R
Hoeken Hans
Hua Xinyu
Ide Nancy
Janier Mathilde
John Lawrence
Kienpointner Manfred
Krauthoff Tobias
Krauwer Steven
Lasnik Howard
Lawrence John
Lawrence John
Lawrence John
Lawrence John
Levy Ran
Liu Bing
Madnani Nitin
Mann William C.
Metzinger Thomas
Musi Elena
Pallotta Vincenzo
Pang Bo
Park Joonsuk
Peldszus Andreas
Perelman Chaïm
Piao Scott
Pollock John
Pollock John L
Reed Chris
Rienks Rutger
Robertson David
Snaith Mark
Stab Christian
Stede Manfred
Toulmin Stephen E
van Eemeren Frans H.
van Rijsbergen Cornelis Joost
Villalba Maria Paz G.
Visser Jacky
Wachsmuth Henning
Walker Marilyn A.
Walton Douglas
Walton Douglas
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2020
Field of study

Crossref

University of Dundee Online Publications

Explainable Argument Mining

Author: Lawrence John
Publication venue
Publication date: 01/01/2021
Field of study

University of Dundee Online Publications

Recommended from our members

PROBABILISTIC MODELS FOR IDENTIFYING AND EXPLAINING CONTROVERSY

Author: Jang Myungha
Publication venue: ScholarWorks@UMass Amherst
Publication date: 02/07/2019
Field of study

Navigating controversial topics on the Web encourages social awareness, supports civil discourse, and promotes critical literacy. While search of controversial topics particularly requires users to use their critical literacy skills on the content, educating people to be more critical readers is known to be a complex and long-term process. Therefore, we are in need of search engines that are equipped with techniques to help users to understand controversial topics by identifying them and explaining why they are controversial. A few approaches for identifying controversy have worked reasonably well in practice, but they are narrow in scope and exhibit limited performance. In this thesis, we first focus on understanding the theoretical grounding of the state-of-the-art algorithm. We derive an underlying probabilistic model that explains the state-of-the-art controversy detection algorithm. We revisit the properties and assumptions from the derived model, and propose new methods to identify controversy on Webpages. We then point out that the current approaches for controversy detection do not consider time while controversy is a dynamically changing phenomenon. This causes current methods to have delays in recognizing emerging controversial topics or exaggerated effects on outdated controversies. We address time-adaptable controversy detection by estimating the dynamically-changing controversy trend of topic by interpolating the observed level of contention and the public interest over time on the topic. Finally, we offer a method that explains controversy by generating a summary of each stance. Our method ranks social media postings using a score of how likely it is that the given post can be a representative summary of controversy

ScholarWorks@UMass Amherst

Development of a language model and opinion extraction for text analysis of online platforms

Author: Abdul Qudar Mohiuddin Md
Publication venue
Publication date: 01/01/2021
Field of study

Language models are one of the fundamental components in a wide variety of natural language processing tasks. The proliferation of text data over the last two decades and the developments in the field of deep learning have encouraged researchers to explore ways to build language models that have achieved results at par with human intelligence. An extensive survey is presented in Chapter 2 exploring the types of language models, with a focus on transformer-based language models owing to the state-of-the-art results achieved and the popularity gained by these models. This survey helped to identify existing shortcomings and research needs. With the advancements of deep learning in the domain of natural language processing, extracting meaningful information from social media platforms, especially Twitter, has become a growing interest among natural language researchers. However, applying existing language representation models to extract information from Twitter does not often produce good results. To address this issue, Chapter 3 introduces two TweetBERT models which are domain specific language presentation models pre-trained on millions of tweets. TweetBERT models significantly outperform the traditional BERT models in Twitter text mining tasks. Moreover, a comprehensive analysis is presented by evaluating 12 BERT models on 31 different datasets. The results validate our hypothesis that continuously training language models on Twitter corpus helps to achieve better performance on Twitter datasets. Finally, in Chapter 4, a novel opinion mining system called ONSET is presented. ONSET is mainly proposed to address the need for large amounts of quality data to fine-tune state-of-the-art pre-trained language models. Fine-tuning language models can only produce good results if trained with a large amount of relevant data. ONSET is a technique that can fine-tune language models for opinion extractions using unlabelled training data. This system is developed through a fine-tuned language model using an unsupervised learning approach to label aspects using topic modeling and then using semi-supervised learning with data augmentation. With extensive experiments performed during this research, the proposed model can achieve similar results as some state-of-the-art models produce with a high quantity of labelled training data

Lakehead University Knowledge Commons

Similarity measures and diversity rankings for query-focused sentence extraction

Author: Achananuparp Palakorn
Publication venue: Drexel University
Publication date
Field of study

Query-focused sentence extraction generally refers to an extractive approach to select a set of sentences that responds to a specific information need. It is one of the major approaches employed in multi-document summarization, focused summarization, and complex question answering. The major advantage of most extractive methods over the natural language processing (NLP) intensive methods is that they are relatively simple, theoretically sound – drawing upon several supervised and unsupervised learning techniques, and often produce equally strong empirical performance. Many research areas, including information retrieval and text mining, have recently moved toward the extractive query-focused sentence generation as its outputs have great potential to support every day‟s information seeking activities. Particularly, as more information have been created and stored online, extractive-based summarization systems may quickly utilize several ubiquitous resources, such as Google search results and social medias, to extract summaries to answer users‟ queries.This thesis explores how the performance of sentence extraction tasks can be improved to create higher quality outputs. Specifically, two major areas are investigated. First, we examine the issue of natural language variation which affects the similarity judgment of sentences. As sentences are much shorter than documents, they generally contain fewer occurring words. Moreover, the similarity notions of sentences are different than those of documents as they tend to be very specific in meanings. Thus many document-level similarity measures are likely to perform well at this level. In this work, we address these issues in two application domains. First, we present a hybrid method, utilizing both unsupervised and supervised techniques, to compute the similarity of interrogative sentences for factoid question reuse. Next, we propose a novel structural similarity measure based on sentence semantics for paraphrase identification and textual entailment recognition tasks. The empirical evaluations suggest the effectiveness of the proposed methods in improving the accuracy of sentence similarity judgments.Furthermore, we examine the effects of the proposed similarity measure in two specific sentence extraction tasks, focused summarization and complex question answering. In conjunction with the proposed similarity measure, we also explore the issues of novelty, redundancy, and diversity in sentence extraction. To that end, we present a novel approach to promote diversity of extracted sets of sentences based on the negative endorsement principle. Negative-signed edges are employed to represent a redundancy relation between sentence nodes in graphs. Then, sentences are reranked according to the long-term negative endorsements from random walk. Additionally, we propose a unified centrality ranking and diversity ranking based on the aforementioned principle. The results from a comprehensive evaluation confirm that the proposed methods perform competitively, compared to many state-of-the-art methods.Ph.D., Information Science -- Drexel University, 201

Drexel Libraries E-Repository and Archives

Representation learning for dialogue systems

Author: Serban Iulian Vlad
Publication venue
Publication date: 01/05/2019
Field of study

Cette thèse présente une série de mesures prises pour étudier l’apprentissage de représentations (par exemple, l’apprentissage profond) afin de mettre en place des systèmes de dialogue et des agents de conversation virtuels. La thèse est divisée en deux parties générales. La première partie de la thèse examine l’apprentissage des représentations pour les modèles de dialogue génératifs. Conditionnés sur une séquence de tours à partir d’un dialogue textuel, ces modèles ont la tâche de générer la prochaine réponse appropriée dans le dialogue. Cette partie de la thèse porte sur les modèles séquence-à-séquence, qui est une classe de réseaux de neurones profonds génératifs. Premièrement, nous proposons un modèle d’encodeur-décodeur récurrent hiérarchique ("Hierarchical Recurrent Encoder-Decoder"), qui est une extension du modèle séquence-à-séquence traditionnel incorporant la structure des tours de dialogue. Deuxièmement, nous proposons un modèle de réseau de neurones récurrents multi-résolution ("Multiresolution Recurrent Neural Network"), qui est un modèle empilé séquence-à-séquence avec une représentation stochastique intermédiaire (une "représentation grossière") capturant le contenu sémantique abstrait communiqué entre les locuteurs. Troisièmement, nous proposons le modèle d’encodeur-décodeur récurrent avec variables latentes ("Latent Variable Recurrent Encoder-Decoder"), qui suivent une distribution normale. Les variables latentes sont destinées à la modélisation de l’ambiguïté et l’incertitude qui apparaissent naturellement dans la communication humaine. Les trois modèles sont évalués et comparés sur deux tâches de génération de réponse de dialogue: une tâche de génération de réponses sur la plateforme Twitter et une tâche de génération de réponses de l’assistance technique ("Ubuntu technical response generation task"). La deuxième partie de la thèse étudie l’apprentissage de représentations pour un système de dialogue utilisant l’apprentissage par renforcement dans un contexte réel. Cette partie porte plus particulièrement sur le système "Milabot" construit par l’Institut québécois d’intelligence artificielle (Mila) pour le concours "Amazon Alexa Prize 2017". Le Milabot est un système capable de bavarder avec des humains sur des sujets populaires à la fois par la parole et par le texte. Le système consiste d’un ensemble de modèles de récupération et de génération en langage naturel, comprenant des modèles basés sur des références, des modèles de sac de mots et des variantes des modèles décrits ci-dessus. Cette partie de la thèse se concentre sur la tâche de sélection de réponse. À partir d’une séquence de tours de dialogues et d’un ensemble des réponses possibles, le système doit sélectionner une réponse appropriée à fournir à l’utilisateur. Une approche d’apprentissage par renforcement basée sur un modèle appelée "Bottleneck Simulator" est proposée pour sélectionner le candidat approprié pour la réponse. Le "Bottleneck Simulator" apprend un modèle approximatif de l’environnement en se basant sur les trajectoires de dialogue observées et le "crowdsourcing", tout en utilisant un état abstrait représentant la sémantique du discours. Le modèle d’environnement est ensuite utilisé pour apprendre une stratégie d’apprentissage du renforcement par le biais de simulations. La stratégie apprise a été évaluée et comparée à des approches concurrentes via des tests A / B avec des utilisateurs réel, où elle démontre d’excellente performance.This thesis presents a series of steps taken towards investigating representation learning (e.g. deep learning) for building dialogue systems and conversational agents. The thesis is split into two general parts. The first part of the thesis investigates representation learning for generative dialogue models. Conditioned on a sequence of turns from a text-based dialogue, these models are tasked with generating the next, appropriate response in the dialogue. This part of the thesis focuses on sequence-to-sequence models, a class of generative deep neural networks. First, we propose the Hierarchical Recurrent Encoder-Decoder model, which is an extension of the vanilla sequence-to sequence model incorporating the turn-taking structure of dialogues. Second, we propose the Multiresolution Recurrent Neural Network model, which is a stacked sequence-to-sequence model with an intermediate, stochastic representation (a "coarse representation") capturing the abstract semantic content communicated between the dialogue speakers. Third, we propose the Latent Variable Recurrent Encoder-Decoder model, which is a variant of the Hierarchical Recurrent Encoder-Decoder model with latent, stochastic normally-distributed variables. The latent, stochastic variables are intended for modelling the ambiguity and uncertainty occurring naturally in human language communication. The three models are evaluated and compared on two dialogue response generation tasks: a Twitter response generation task and the Ubuntu technical response generation task. The second part of the thesis investigates representation learning for a real-world reinforcement learning dialogue system. Specifically, this part focuses on the Milabot system built by the Quebec Artificial Intelligence Institute (Mila) for the Amazon Alexa Prize 2017 competition. Milabot is a system capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language retrieval and generation models, including template-based models, bag-of-words models, and variants of the models discussed in the first part of the thesis. This part of the thesis focuses on the response selection task. Given a sequence of turns from a dialogue and a set of candidate responses, the system must select an appropriate response to give the user. A model-based reinforcement learning approach, called the Bottleneck Simulator, is proposed for selecting the appropriate candidate response. The Bottleneck Simulator learns an approximate model of the environment based on observed dialogue trajectories and human crowdsourcing, while utilizing an abstract (bottleneck) state representing high-level discourse semantics. The learned environment model is then employed to learn a reinforcement learning policy through rollout simulations. The learned policy has been evaluated and compared to competing approaches through A/B testing with real-world users, where it was found to yield excellent performance

Dépôt Institutionnel Numérique

Harnessing Rhetorical Figures for Argument Mining:A Pilot Study in Relating Figures of Speech to Argument Structure

Author: Chesñevar
Chien
Gladkova
Grasso
Harris
Liu
Pallotta
Pang
Peldszus
Reed
Steen
Webber
Publication venue: 'IOS Press'
Publication date: 01/01/2017
Field of study

Crossref

University of Dundee Online Publications