601 research outputs found

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages

    Get PDF
    Jebbara S. Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages. Bielefeld: Universität Bielefeld; 2020.Everyday, vast amounts of unstructured, textual data are shared online in digital form. Websites such as forums, social media sites, review sites, blogs, and comment sections offer platforms to express and discuss opinions and experiences. Understanding the opinions in these resources is valuable for e.g. businesses to support market research and customer service but also individuals, who can benefit from the experiences and expertise of others. In this thesis, we approach the topic of opinion extraction and classification with neural network models. We regard this area of sentiment analysis as a relation extraction problem in which the sentiment of some opinion holder towards a certain aspect of a product, theme, or event needs to be extracted. In accordance with this framework, our main contributions are the following: 1. We propose a full system addressing all subtasks of relational sentiment analysis. 2. We investigate how semantic web resources can be leveraged in a neural-network-based model for the extraction of opinion targets and the classification of sentiment labels. Specifically, we experiment with enhancing pretrained word embeddings using the lexical resource WordNet. Furthermore, we enrich a purely text-based model with SenticNet concepts and observe an improvement for sentiment classification. 3. We examine how opinion targets can be automatically identified in noisy texts. Customer reviews, for instance, are prone to contain misspelled words and are difficult to process due to their domain-specific language. We integrate information about the character structure of a word into a sequence labeling system using character-level word embeddings and show their positive impact on the system's performance. We reveal encoded character patterns of the learned embeddings and give a nuanced view of the obtained performance differences. 4. Opinion target extraction usually relies on supervised learning approaches. We address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Leveraging Social Discourse to Measure Check-worthiness of Claims for Fact-checking

    Full text link
    The expansion of online social media platforms has led to a surge in online content consumption. However, this has also paved the way for disseminating false claims and misinformation. As a result, there is an escalating demand for a substantial workforce to sift through and validate such unverified claims. Currently, these claims are manually verified by fact-checkers. Still, the volume of online content often outweighs their potency, making it difficult for them to validate every single claim in a timely manner. Thus, it is critical to determine which assertions are worth fact-checking and prioritize claims that require immediate attention. Multiple factors contribute to determining whether a claim necessitates fact-checking, encompassing factors such as its factual correctness, potential impact on the public, the probability of inciting hatred, and more. Despite several efforts to address claim check-worthiness, a systematic approach to identify these factors remains an open challenge. To this end, we introduce a new task of fine-grained claim check-worthiness, which underpins all of these factors and provides probable human grounds for identifying a claim as check-worthy. We present CheckIt, a manually annotated large Twitter dataset for fine-grained claim check-worthiness. We benchmark our dataset against a unified approach, CheckMate, that jointly determines whether a claim is check-worthy and the factors that led to that conclusion. We compare our suggested system with several baseline systems. Finally, we report a thorough analysis of results and human assessment, validating the efficacy of integrating check-worthiness factors in detecting claims worth fact-checking.Comment: 28 pages, 2 figures, 8 table

    One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

    Get PDF
    When learning a new skill, you take advantage of your preexisting skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the lexical overlap between these two languages will likely benefit your rate of language acquisition. This thesis deals with the intersection of learning multiple tasks and learning multiple languages in the context of Natural Language Processing (NLP), which can be defined as the study of computational processing of human language. Although these two types of learning may seem different on the surface, we will see that they share many similarities. The traditional approach in NLP is to consider a single task for a single language at a time. However, recent advances allow for broadening this approach, by considering data for multiple tasks and languages simultaneously. This is an important approach to explore further as the key to improving the reliability of NLP, especially for low-resource languages, is to take advantage of all relevant data whenever possible. In doing so, the hope is that in the long term, low-resource languages can benefit from the advances made in NLP which are currently to a large extent reserved for high-resource languages. This, in turn, may then have positive consequences for, e.g., language preservation, as speakers of minority languages will have a lower degree of pressure to using high-resource languages. In the short term, answering the specific research questions posed should be of use to NLP researchers working towards the same goal.Comment: PhD thesis, University of Groninge
    corecore