17 research outputs found

    TOWARDS BUILDING AN INTELLIGENT REVISION ASSISTANT FOR ARGUMENTATIVE WRITINGS

    Get PDF
    Current intelligent writing assistance tools (e.g. Grammarly, Turnitin, etc.) typically work by locating the problems of essays for users (grammar, spelling, argument, etc.) and providing possible solutions. These tools focus on providing feedback on a single draft, while ignoring feedback on an author’s changes between drafts (revision). This thesis argues that it is also important to provide feedback on authors’ revision, as such information can not only improve the quality of the writing but also improve the rewriting skill of the authors. Thus, it is desirable to build an intelligent assistant that focuses on providing feedback to revisions. This thesis presents work from two perspectives towards the building of such an assistant: 1) a study of the revision’s impact on writings, which includes the development of a sentence-level revision schema, the annotation of corpora based on the schema and data analysis on the created corpora; a prototype revision assistant was built to provide revision feedback based on the schema and a user study was conducted to investigate whether the assistant could influence the users’ rewriting behaviors. 2) the development of algorithms for automatic revision identification, which includes the automatic extraction of the revised content and the automatic classification of revision types; we first investigated the two problems separately in a pipeline manner and then explored a joint approach that solves the two problems at the same time

    Domains and functions:A two-dimensional account of discourse markers

    Get PDF
    Discourse markers and their functions have been modeled through a large number of very diverse frameworks. Most of these models target written language and the discourse relations which hold between sentences. In this paper, we present, assess and apply a new annotation taxonomy, which targets discourse markers (instead of discourse relations) in spoken language, addressing their polyfunctionality in an alternative way. In particular, its main innovative feature is to distinguish between two independent layers of semantic-pragmatic information (i.e. domains and functions) which, once combined, provide a fine-grained disambiguation of discourse markers. We compare the affordances of this model to existing proposals, and illustrate them with a corpus study. A sample of conversational French containing 423 discourse marker tokens was fully analyzed by two independent annotators. We report on inter-annotator agreement scores, as well as quantitative analyses of the distribution of domains and functions in the sample. Both powerful and economical, this proposal advocates for a flexible and modular approach to discourse analysis, and paves the way for further corpus-based studies on the challenging category of discourse markers.Les marqueurs du discours et leurs fonctions ont fait l’objet de modélisations nombreuses et variées. La plupart de ces modèles portent sur l’écrit et sur les relations discursives entre énoncés. Dans cet article, nous présentons, évaluons et appliquons un nouveau modèle d’annotation qui porte sur les marqueurs du discours (et non sur les relations discursives) à l’oral, offrant une perspective nouvelle sur la polyfonctionnalité des marqueurs. Sa caractéristique la plus innovante est de définir deux couches indépendantes d’information sémantico-pragmatique (c.à.d domaines et fonctions) qui, une fois combinées, fournissent une désambigüisation fine des marqueurs du discours. Nous comparons les apports de ce modèle à d’autres approaches existantes et les illustrons dans une étude de corpus. Un échantillon de français conversationnel contenant 423 marqueurs du discours a été entièrement analysé par deux annotateurs. Nous analysons les scores d’accord inter-annotateurs, ainsi que la distribution des domaines et des fonctions dans l’échantillon. À la fois puissant et économique, ce modèle prône une approche flexible et modulaire de l’analyse du discours, et jette les bases pour de futures études de corpus sur la catégorie complexe des marqueurs du discours

    Investigating and Modelling Rationale Style Arguments

    Get PDF

    Argumentative Writing Support by means of Natural Language Processing

    Get PDF
    Persuasive essay writing is a powerful pedagogical tool for teaching argumentation skills. So far, the provision of feedback about argumentation has been considered a manual task since automated writing evaluation systems are not yet capable of analyzing written arguments. Computational argumentation, a recent research field in natural language processing, has the potential to bridge this gap and to enable novel argumentative writing support systems that automatically provide feedback about the merits and defects of written arguments. The automatic analysis of natural language arguments is, however, subject to several challenges. First of all, creating annotated corpora is a major impediment for novel tasks in natural language processing. At the beginning of this research, it has been mostly unknown whether humans agree on the identification of argumentation structures and the assessment of arguments in persuasive essays. Second, the automatic identification of argumentation structures involves several interdependent and challenging subtasks. Therefore, considering each task independently is not sufficient for identifying consistent argumentation structures. Third, ordinary arguments are rarely based on logical inference rules and are hardly ever in a standardized form which poses additional challenges to human annotators and computational methods. To approach these challenges, we start by investigating existing argumentation theories and compare their suitability for argumentative writing support. We derive an annotation scheme that models arguments as tree structures. For the first time, we investigate whether human annotators agree on the identification of argumentation structures in persuasive essays. We show that human annotators can reliably apply our annotation scheme to persuasive essays with substantial agreement. As a result of this annotation study, we introduce a unique corpus annotated with fine-grained argumentation structures at the discourse-level. Moreover, we pre- sent a novel end-to-end approach for parsing argumentation structures. We identify the boundaries of argument components using sequence labeling at the token level and propose a novel joint model that globally optimizes argument component types and argumentative relations for identifying consistent argumentation structures. We show that our model considerably improves the performance of local base classifiers and significantly outperforms challenging heuristic baselines. In addition, we introduce two approaches for assessing the quality of natural language arguments. First, we introduce an approach for identifying myside biases which is a well-known tendency to ignore opposing arguments when formulating arguments. Our experimental results show that myside biases can be recognized with promising accuracy using a combination of lexical features, syntactic features and features based on adversative transitional phrases. Second, we investigate for the first time the characteristics of insufficiently supported arguments. We show that insufficiently supported arguments frequently exhibit specific lexical indicators. Moreover, our experimental results indicate that convolutional neural networks significantly outperform several challenging baselines

    Unsupervised extraction of semantic relations using discourse information

    Get PDF
    La compréhension du langage naturel repose souvent sur des raisonnements de sens commun, pour lesquels la connaissance de relations sémantiques, en particulier entre prédicats verbaux, peut être nécessaire. Cette thèse porte sur la problématique de l'utilisation d'une méthode distributionnelle pour extraire automatiquement les informations sémantiques nécessaires à ces inférences de sens commun. Des associations typiques entre des paires de prédicats et un ensemble de relations sémantiques (causales, temporelles, de similarité, d'opposition, partie/tout) sont extraites de grands corpus, par l'exploitation de la présence de connecteurs du discours signalant typiquement ces relations. Afin d'apprécier ces associations, nous proposons plusieurs mesures de signifiance inspirées de la littérature ainsi qu'une mesure novatrice conçue spécifiquement pour évaluer la force du lien entre les deux prédicats et la relation. La pertinence de ces mesures est évaluée par le calcul de leur corrélation avec des jugements humains, obtenus par l'annotation d'un échantillon de paires de verbes en contexte discursif. L'application de cette méthodologie sur des corpus de langue française et anglaise permet la construction d'une ressource disponible librement, Lecsie (Linked Events Collection for Semantic Information Extraction). Celle-ci est constituée de triplets: des paires de prédicats associés à une relation; à chaque triplet correspondent des scores de signifiance obtenus par nos mesures.Cette ressource permet de dériver des représentations vectorielles de paires de prédicats qui peuvent être utilisées comme traits lexico-sémantiques pour la construction de modèles pour des applications externes. Nous évaluons le potentiel de ces représentations pour plusieurs applications. Concernant l'analyse du discours, les tâches de la prédiction d'attachement entre unités du discours, ainsi que la prédiction des relations discursives spécifiques les reliant, sont explorées. En utilisant uniquement les traits provenant de notre ressource, nous obtenons des améliorations significatives pour les deux tâches, par rapport à plusieurs bases de référence, notamment des modèles utilisant d'autres types de représentations lexico-sémantiques. Nous proposons également de définir des ensembles optimaux de connecteurs mieux adaptés à des applications sur de grands corpus, en opérant une réduction de dimension dans l'espace des connecteurs, au lieu d'utiliser des groupes de connecteurs composés manuellement et correspondant à des relations prédéfinies. Une autre application prometteuse explorée dans cette thèse concerne les relations entre cadres sémantiques (semantic frames, e.g. FrameNet): la ressource peut être utilisée pour enrichir cette structure par des relations potentielles entre frames verbaux à partir des associations entre leurs verbes. Ces applications diverses démontrent les contributions prometteuses amenées par notre approche permettant l'extraction non supervisée de relations sémantiques.Natural language understanding often relies on common-sense reasoning, for which knowledge about semantic relations, especially between verbal predicates, may be required. This thesis addresses the challenge of using a distibutional method to automatically extract the necessary semantic information for common-sense inference. Typical associations between pairs of predicates and a targeted set of semantic relations (causal, temporal, similarity, opposition, part/whole) are extracted from large corpora, by exploiting the presence of discourse connectives which typically signal these semantic relations. In order to appraise these associations, we provide several significance measures inspired from the literature as well as a novel measure specifically designed to evaluate the strength of the link between the two predicates and the relation. The relevance of these measures is evaluated by computing their correlations with human judgments, based on a sample of verb pairs annotated in context. The application of this methodology to French and English corpora leads to the construction of a freely available resource, Lecsie (Linked Events Collection for Semantic Information Extraction), which consists of triples: pairs of event predicates associated with a relation; each triple is assigned significance scores based on our measures. From this resource, vector-based representations of pairs of predicates can be induced and used as lexical semantic features to build models for external applications. We assess the potential of these representations for several applications. Regarding discourse analysis, the tasks of predicting attachment of discourse units, as well as predicting the specific discourse relation linking them, are investigated. Using only features from our resource, we obtain significant improvements for both tasks in comparison to several baselines, including ones using other representations of the pairs of predicates. We also propose to define optimal sets of connectives better suited for large corpus applications by performing a dimension reduction in the space of the connectives, instead of using manually composed groups of connectives corresponding to predefined relations. Another promising application pursued in this thesis concerns relations between semantic frames (e.g. FrameNet): the resource can be used to enrich this sparse structure by providing candidate relations between verbal frames, based on associations between their verbs. These diverse applications aim to demonstrate the promising contributions provided by our approach, namely allowing the unsupervised extraction of typed semantic relations

    한국어 텍스트 논증 구조의 자동 분석 연구

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 언어학과 언어학전공, 2016. 2. 신효필.최근 온라인 텍스트 자료를 이용하여 대중의 의견을 분석하는 작업이 활발히 이루어지고 있다. 이러한 작업에는 주관적 방향성을 갖는 텍스트의 논증 구조와 중요 내용을 파악하는 과정이 필요하며, 자료의 양과 다양성이 급격히 증가하면서 그 과정의 자동화가 불가피해지고 있다. 본 연구에서는 정책에 대한 찬반 의견으로 구성된 한국어 텍스트 자료를 직접 구축하고, 글을 구성하는 기본 단위들 사이의 담화 관계의 유형을 정의하였다. 하나의 맥락 안에서 두 개의 문장 혹은 절이 서로 관계를 갖는지, 관계를 갖는다면 서로 동등한 관계인지, 그렇지 않은 경우 어느 문장(절)이 더 중요한 부분으로서 다른 하나의 지지를 받는지의 기준에 따라 담화 관계를 두 개의 층위로 나누어 이용하였다. 이러한 기본 단위들 사이의 관계는 기계 학습과 규칙 기반 방식을 이용하여 예측된다. 이 때 각 글의 저자가 표현하고자 하는 의도, 자신의 주장을 뒷받침하기 위해 제시하는 근거의 종류, 그리고 그 근거를 이루는 논증 전략 등이 텍스트의 언어적 특징과 함께 중요한 자질로 작용된다. 논증의 전략으로는 예시, 인과, 세부 사항에 대한 설명, 반복 서술, 정정, 배경 지식 제공 등이 관찰되었다. 이들 세부 분류는 담화 관계의 대분류를 구성하고, 그 담화 관계를 예측하는 데 쓰이는 자질의 기반이 되었다. 또한 일부 언어적 자질들은 기존 연구를 참고하여 한국어 자료에 적용할 수 있는 형태로 재구성하였다. 이를 이용하여 한국어 코퍼스를 구축하고 한국어 연구에 특화된 접속사 및 연결어의 목록을 구성하여 자질 목록에 포함시켰다. 이러한 자질들에 기반해서 담화 관계를 예측하는 과정을 이 연구에서 독자적인 모델로서 자동화하여 제안하였다. 예측 실험의 결과를 보면 본 연구에서 정의하여 이용한 자질들은 긍정적인 상호 작용을 통해 담화 관계 예측의 성능을 향상시킨다는 것을 알 수 있었다. 그 중에서도 일부 접속사 및 연결어, 문장 성분의 유무에 따른 의존적인 문장 구조, 그리고 같은 내용을 반복 서술하는지의 여부 등이 특히 예측에 기여하였다. 텍스트를 이루는 기본 단위들 사이에 존재하는 담화 관계들은 서로 연결, 합성되어 텍스트 전체에 대응되는 트리 형태의 논증 구조를 이룬다. 이렇게 얻은 논증 구조에 대해서는, 트리의 가장 위쪽인 루트 노드에 글의 주제문이 위치하고, 그 바로 아래 층위에 해당하는 문장(절)들이 근거로서 가장 중요한 내용을 담고 있다고 가정할 수 있다. 따라서 주제문을 직접적으로 뒷받침하는 문장(절)을 추출하면 글의 중요 내용을 얻게 된다. 이는 곧 텍스트 요약 작업에서 유용하게 쓰이는 방식이 될 수 있다. 또한 주제에 따른 입장 분류나 근거 수집 등 다양한 분야에서도 응용이 가능할 것이다.These days, there is an increased need to analyze mass opinions using on-line text data. These tasks need to recognize the argumentation schemes and main contents of subjective, argumentative writing, and the automatization of the required procedures is becoming indispensable. This thesis constructed the text data using Korean debates on certain political issues, and defined the types of discourse relations between basic units of text segments. The discourse relations are classified into two levels and four subclasses, according to the standards which determine whether the two segments are related to each other in a context, whether the relation is coordinating or subordinating, and which of the two units in a pair is supported by the other as a more important part. The relations between basic text units are predicted based on machine learning and rule-based methods. The features for the prediction of discourse relations include what the author of a text wants to claim and argumentative strategies comprising grounds for the author's claim, using linguistic properties shown in texts. The strategies for argument are observed and subcategorized into Providing Examples, Cause-and-Effects, Explanations in Detail, Restatements, Contrasts, Background Knowledge, and more. These subclasses compose a broader class of discourse relations and became the basis for features used during the classification of the relations. Some linguistic features refer to those of previous studies, they are reconstituted in a revised form which is more appropriate for Korean data. Thus, this study constructed a Korean debate corpus and a list of connectives specialized to deal with Korean texts to include in the experiment features. The automated prediction of discourse relations based on those features is suggested in this study as a unique model of argument mining. According to the results of experiments predicting discourse relations, the features defined and used in this study are observed to improve the performance of prediction tasks through positive interactions with each other. In particular, some explicit connectives, dependent sentence structures based on lack of certain components, and whether the same meanings are restated clearly contributed to the classification tasks. The discourse relations between basic text units are related and combined with each other to comprise a tree-form argumentation structure for the overall document. Regarding the argumentation structure, the topic sentence of the document is located at the root node in the tree, and it is assumed that the nodes of sentences or clauses right below the root node contain the most important contents as grounds for the topic unit. Therefore, extraction of the text segments directly supporting the topic sentence may help in obtaining the important contents in each document. This can be one of the useful methods in text summarization. Additionally, applications to various fields may also be possible, including stance classification of debate texts, extraction of grounds for certain topics, and so on.1 Introduction 1 1.1 Purposes 1 1.1.1 A Study of Korean Texts with Linguistic Cues 1 1.1.2 Detection of Argumentation Schemes in Debate Texts 2 1.1.3 Extraction of Important Content in Argumentation Schemes of Texts 2 1.2 Structure 3 2 Previous Work 5 2.1 Argumentation Mining Tasks 7 2.1.1 Argument Elements 7 2.1.2 Argumentation Schemes 9 2.2 Argumentation Schemes in Various Texts 14 2.2.1 Dialogic vs. Monologic Texts 14 2.2.2 Debate Texts vs. Other Texts 15 2.2.3 Studies in Other Languages 17 2.3 Theoretical Basis 18 2.3.1 Argumentation Theory 18 2.3.2 Discourse Theory 21 3 Identifying Argumentation Schemes in Debate Texts 25 3.1 Data Description 25 3.2 Basic Units 27 3.3 Discourse Relations 29 3.3.1 Strategies for Proving a Claim 29 3.3.2 Definition 35 4 Automatic Identification of Argumentation Schemes 41 4.1 Annotation 41 4.2 Baseline 46 4.3 Proposed Model 50 4.3.1 O vs. X Classification 51 4.3.2 Convergent Relation Rule 61 4.3.3 NN vs. NS vs. SN Classification 65 4.4 Evaluation 67 4.4.1 Measures 67 4.4.2 Results 68 4.5 Discussion 74 4.6 A Pilot Study on English Texts 81 5 Detecting Important Units 87 6 Conclusion 99 Bibliography 103 초록 117Maste

    On the Promotion of the Social Web Intelligence

    Get PDF
    Given the ever-growing information generated through various online social outlets, analytical research on social media has intensified in the past few years from all walks of life. In particular, works on social Web intelligence foster and benefit from the wisdom of the crowds and attempt to derive actionable information from such data. In the form of collective intelligence, crowds gather together and contribute to solving problems that may be difficult or impossible to solve by individuals and single computers. In addition, the consumer insight revealed from social footprints can be leveraged to build powerful business intelligence tools, enabling efficient and effective decision-making processes. This dissertation is broadly concerned with the intelligence that can emerge from the social Web platforms. In particular, the two phenomena of social privacy and online persuasion are identified as the two pillars of the social Web intelligence, studying which is essential in the promotion and advancement of both collective and business intelligence. The first part of the dissertation is focused on the phenomenon of social privacy. This work is mainly motivated by the privacy dichotomy problem. Users often face difficulties specifying privacy policies that are consistent with their actual privacy concerns and attitudes. As such, before making use of social data, it is imperative to employ multiple safeguards beyond the current privacy settings of users. As a possible solution, we utilize user social footprints to detect their privacy preferences automatically. An unsupervised collaborative filtering approach is proposed to characterize the attributes of publicly available accounts that are intended to be private. Unlike the majority of earlier studies, a variety of social data types is taken into account, including the social context, the published content, as well as the profile attributes of users. Our approach can provide support in making an informed decision whether to exploit one\u27s publicly available data to draw intelligence. With the aim of gaining insight into the strategies behind online persuasion, the second part of the dissertation studies written comments in online deliberations. Specifically, we explore different dimensions of the language, the temporal aspects of the communication, as well as the attributes of the participating users to understand what makes people change their beliefs. In addition, we investigate the factors that are perceived to be the reasons behind persuasion by the users. We link our findings to traditional persuasion research, hoping to uncover when and how they apply to online persuasion. A set of rhetorical relations is known to be of importance in persuasive discourse. We further study the automatic identification and disambiguation of such rhetorical relations, aiming to take a step closer towards automatic analysis of online persuasion. Finally, a small proof of concept tool is presented, showing the value of our persuasion and rhetoric studies

    Enhancing extractive summarization with automatic post-processing

    Get PDF
    Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2015Any solution or device that may help people to optimize their time in doing productive work is of a great help. The steadily increasing amount of information that must be handled by each person everyday, either in their professional tasks or in their personal life, is becoming harder to be processed. By reducing the texts to be handled, automatic text summarization is a very useful procedure that can help to reduce significantly the amount of time people spend in many of their reading tasks. In the context of handling several texts, dealing with redundancy and focusing on relevant information the major problems to be addressed in automatic multi-document summarization. The most common approach to this task is to build a summary with sentences retrieved from the input texts. This approach is named extractive summarization. The main focus of current research on extractive summarization has been algorithm optimization, striving to enhance the selection of content. However, gains related to the increasing of algorithms complexity have not yet been proved, as the summaries remain difficult to be processed by humans in a satisfactory way. A text built fromdifferent documents by extracting sentences fromthemtends to form a textually fragile sequence of sentences, whose elements tend to be weakly related. In the present work, tasks that modify and relate the summary sentences are combined in a post-processing procedure. These tasks include sentence reduction, paragraph creation and insertion of discourse connectives, seeking to improve the textual quality of the final summary to be delivered to human users. Thus, this dissertation addresses automatic text summarization in a different perspective, by exploring the impact of the postprocessing of extraction-based summaries in order to build fluent and cohesive texts and improved summaries for human usage.Qualquer solução ou dispositivo que possa ajudar as pessoas a optimizar o seu tempo, de forma a realizar tarefas produtivas, é uma grande ajuda. A quantidade de informação que cada pessoa temque manipular, todos os dias, seja no trabalho ou na sua vida pessoal, é difícil de ser processada. Ao comprimir os textos a serem processados, a sumarização automática é uma tarefa muito útil, que pode reduzir significativamente a quantidade de tempo que as pessoas despendem em tarefas de leitura. Lidar com a redundância e focar na informação relevante num conjunto de textos são os principais objectivos da sumarização automática de vários documentos. A abordagem mais comum para esta tarefa consiste em construirse o resumo com frases obtidas a partir dos textos originais. Esta abordagem é conhecida como sumarização extractiva. O principal foco da investigação mais recente sobre sumarização extrativa é a optimização de algoritmos que visam obter o conteúdo relevante expresso nos textos originais. Porém, os ganhos relacionados com o aumento da complexidade destes algoritmos não foram ainda comprovados, já que os sumários continuam a ser difíceis de ler. É expectável que um texto, cujas frases foram extraídas de diferentes fontes, forme uma sequência frágil, sobretudo pela falta de interligação dos seus elementos. No contexto deste trabalho, tarefas que modificam e relacionam frases são combinadas numprocedimento denominado pós-processamento. Estas tarefas incluem a simplificação de frases, a criação de parágrafos e a inserção de conectores de discurso, que juntas procurammelhorar a qualidade do sumário final. Assim, esta dissertação aborda a sumarização automática numa perspectiva diferente, estudando o impacto do pós-processamento de um sumário extractivo, a fim de produzir um texto final fluente e coeso e em vista de se obter uma melhor qualidade textual.Fundação para a Ciência e a Tecnologia (FCT), SFRH/BD/45133/200
    corecore