194 research outputs found

    Explainable Argument Mining

    Get PDF

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Argumentation models and their use in corpus annotation: practice, prospects, and challenges

    Get PDF
    The study of argumentation is transversal to several research domains, from philosophy to linguistics, from the law to computer science and artificial intelligence. In discourse analysis, several distinct models have been proposed to harness argumentation, each with a different focus or aim. To analyze the use of argumentation in natural language, several corpora annotation efforts have been carried out, with a more or less explicit grounding on one of such theoretical argumentation models. In fact, given the recent growing interest in argument mining applications, argument-annotated corpora are crucial to train machine learning models in a supervised way. However, the proliferation of such corpora has led to a wide disparity in the granularity of the argument annotations employed. In this paper, we review the most relevant theoretical argumentation models, after which we survey argument annotation projects closely following those theoretical models. We also highlight the main simplifications that are often introduced in practice. Furthermore, we glimpse other annotation efforts that are not so theoretically grounded but instead follow a shallower approach. It turns out that most argument annotation projects make their own assumptions and simplifications, both in terms of the textual genre they focus on and in terms of adapting the adopted theoretical argumentation model for their own agenda. Issues of compatibility among argument-annotated corpora are discussed by looking at the problem from a syntactical, semantic, and practical perspective. Finally, we discuss current and prospective applications of models that take advantage of argument-annotated corpora

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    A mathematics rendering model to support chat-based tutoring

    Get PDF
    Dr Math is a math tutoring service implemented on the chat application Mxit. The service allows school learners to use their mobile phones to discuss mathematicsrelated topics with human tutors. Using the broad user-base provided by Mxit, the Dr Math service has grown to consist of tens of thousands of registered school learners. The tutors on the service are all volunteers and the learners far outnumber the available tutors at any given time. School learners on the service use a shorthand language-form called microtext, to phrase their queries. Microtext is an informal form of language which consists of a variety of misspellings and symbolic representations, which emerge spontaneously as a result of the idiosyncrasies of a learner. The specific form of microtext found on the Dr Math service contains mathematical questions and example equations, pertaining to the tutoring process. Deciphering the queries, to discover their embedded mathematical content, slows down the tutoring process. This wastes time that could have been spent addressing more learner queries. The microtext language thus creates an unnecessary burden on the tutors. This study describes the development of an automated process for the translation of Dr Math microtext queries into mathematical equations. Using the design science research paradigm as a guide, three artefacts are developed. These artefacts take the form of a construct, a model and an instantiation. The construct represents the creation of new knowledge as it provides greater insight into the contents and structure of the language found on a mobile mathematics tutoring service. The construct serves as the basis for the creation of a model for the translation of microtext queries into mathematical equations, formatted for display in an electronic medium. No such technique currently exists and therefore, the model contributes new knowledge. To validate the model, an instantiation was created to serve as a proof-of-concept. The instantiation applies various concepts and techniques, such as those related to natural language processing, to the learner queries on the Dr Math service. These techniques are employed in order to translate an input microtext statement into a mathematical equation, structured by using mark-up language. The creation of the instantiation thus constitutes a knowledge contribution, as most of these techniques have never been applied to the problem of translating microtext into mathematical equations. For the automated process to have utility, it should perform on a level comparable to that of a human performing a similar translation task. To determine how closely related the results from the automated process are to those of a human, three human participants were asked to perform coding and translation tasks. The results of the human participants were compared to the results of the automated process, across a variety of metrics, including agreement, correlation, precision, recall and others. The results from the human participants served as the baseline values for comparison. The baseline results from the human participants were compared with those of the automated process. Krippendorff’s α was used to determine the level of agreement and Pearson’s correlation coefficient to determine the level of correlation between the results. The agreement between the human participants and the automated process was calculated at a level deemed satisfactory for exploratory research and the level of correlation was calculated as moderate. These values correspond with the calculations made as the human baseline. Furthermore, the automated process was able to meet or improve on all of the human baseline metrics. These results serve to validate that the automated process is able to perform the translation at a level comparable to that of a human. The automated process is available for integration into any requesting application, by means of a publicly accessible web service

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
    • …
    corecore