170 research outputs found

    Are You Being Rhetorical? A Description of Rhetorical Move Annotation Tools and Open Corpus of Sample Machine-Annotated Rhetorical Moves

    Full text link
    Writing analytics has emerged as a sub-field of learning analytics, with applications including the provision of formative feedback to students in developing their writing capacities. Rhetorical markers in writing have become a key feature in this feedback, with a number of tools being developed across research and teaching contexts. However, there is no shared corpus of texts annotated by these tools, nor is it clear how the tool annotations compare. Thus, resources are scarce for comparing tools for both tool development and pedagogic purposes. In this paper, we conduct such a comparison and introduce a sample corpus of texts representative of the particular genres, a subset of which has been annotated using three rhetorical analysis tools (one of which has two versions). This paper aims to provide both a description of the tools and a shared dataset in order to support extensions of existing analyses and tool design in support of writing skill development. We intend the description of these tools, which share a focus on rhetorical structures, alongside the corpus, to be a preliminary step to enable further research, with regard to both tool development and tool interaction</jats:p

    Mining arguments in scientific abstracts: Application to argumentative quality assessment

    Get PDF
    Argument mining consists in the automatic identification of argumentative structures in natural language, a task that has been recognized as particularly challenging in the scientific domain. In this work we propose SciARG, a new annotation scheme, and apply it to the identification of argumentative units and relations in abstracts in two scientific disciplines: computational linguistics and biomedicine, which allows us to assess the applicability of our scheme to different knowledge fields. We use our annotated corpus to train and evaluate argument mining models in various experimental settings, including single and multi-task learning. We investigate the possibility of leveraging existing annotations, including discourse relations and rhetorical roles of sentences, to improve the performance of argument mining models. In particular, we explore the potential offered by a sequential transfer- learning approach in which supplementary training tasks are used to fine-tune pre-trained parameter-rich language models. Finally, we analyze the practical usability of the automatically-extracted components and relations for the prediction of argumentative quality dimensions of scientific abstracts.Agencia Nacional de Investigación e InnovaciónMinisterio de Economía, Industria y Competitividad (España

    Are You Being Rhetorical? An Open Corpus of Machine Annotated Rhetorical Moves

    Get PDF
    Writing analytics has emerged as a sub-field of learning analytics, with applications including the provision of formative feedback to students in developing their writing capacities. Rhetorical markers in writing have become a key feature in this feedback, with a number of tools being developed across research and teaching contexts. However, there is no shared corpus of texts annotated by these tools, nor is it clear how the tool annotations compare. Thus, resources are scarce for comparing tools for both tool development and pedagogic purposes. In this paper, we conduct such a comparison and introduce a sample corpus of texts representative of the particular genres, a subset of which has been annotated using three rhetorical analysis tools (one of which has two versions). This paper aims to provide both a description of the tools and a shared dataset in order to support extensions of existing analyses and tool design in support of writing skill development. We intend the description of these tools, which share a focus on rhetorical structures, alongside the corpus, to be a preliminary step to enable further research, with regard to both tool development and tool interaction

    Developing resources for sentiment analysis of informal Arabic text in social media

    Get PDF
    Natural Language Processing (NLP) applications such as text categorization, machine translation, sentiment analysis, etc., need annotated corpora and lexicons to check quality and performance. This paper describes the development of resources for sentiment analysis specifically for Arabic text in social media. A distinctive feature of the corpora and lexicons developed are that they are determined from informal Arabic that does not conform to grammatical or spelling standards. We refer to Arabic social media content of this sort as Dialectal Arabic (DA) - informal Arabic originating from and potentially mixing a range of different individual dialects. The paper describes the process adopted for developing corpora and sentiment lexicons for sentiment analysis within different social media and their resulting characteristics. The addition to providing useful NLP data sets for Dialectal Arabic the work also contributes to understanding the approach to developing corpora and lexicons

    Argument mining: A machine learning perspective

    Get PDF
    Argument mining has recently become a hot topic, attracting the interests of several and diverse research communities, ranging from artificial intelligence, to computational linguistics, natural language processing, social and philosophical sciences. In this paper, we attempt to describe the problems and challenges of argument mining from a machine learning angle. In particular, we advocate that machine learning techniques so far have been under-exploited, and that a more proper standardization of the problem, also with regards to the underlying argument model, could provide a crucial element to develop better systems

    Corpora for sentiment analysis of Arabic text in social media

    Get PDF
    Different Natural Language Processing (NLP) applications such as text categorization, machine translation, etc., need annotated corpora to check quality and performance. Similarly, sentiment analysis requires annotated corpora to test the performance of classifiers. Manual annotation performed by native speakers is used as a benchmark test to measure how accurate a classifier is. In this paper we summarise currently available Arabic corpora and describe work in progress to build, annotate, and use Arabic corpora consisting of Facebook (FB) posts. The distinctive nature of thesecorpora is that it is based on posts written in Dialectal Arabic (DA) not following specific grammatical or spelling standards. The corpora are annotated with five labels (positive, negative, dual, neutral, and spam). In addition to building the corpus, the paper illustrates how manual tagging can be used to extract opinionated words and phrases to be used in a lexicon-based classifier
    corecore