996 research outputs found

    BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer

    Full text link
    Retrieval-based language models are increasingly employed in question-answering tasks. These models search in a corpus of documents for relevant information instead of having all factual knowledge stored in its parameters, thereby enhancing efficiency, transparency, and adaptability. We develop the first Norwegian retrieval-based model by adapting the REALM framework and evaluating it on various tasks. After training, we also separate the language model, which we call the reader, from the retriever components, and show that this can be fine-tuned on a range of downstream tasks. Results show that retrieval augmented language modeling improves the reader's performance on extractive question-answering, suggesting that this type of training improves language models' general ability to use context and that this does not happen at the expense of other abilities such as part-of-speech tagging, dependency parsing, named entity recognition, and lemmatization. Code, trained models, and data are made publicly available.Comment: Accepted for NoDaLiDa 2023, main conferenc

    Fine-grained sentiment analysis for measuring customer satisfaction using an extended set of fuzzy linguistic hedges

    Get PDF
    © 2020 The Authors. Published by Atlantis Press SARL. In recent years, the boom in social media sites such as Facebook and Twitter has brought people together for the sharing of opinions, sentiments, emotions, and experiences about products, events, politics, and other topics. In particular, sentiment-based applications are growing in popularity among individuals and businesses for the making of purchase decisions. Fuzzy-based sentiment analysis aims at classifying customer sentiment at a fine-grained level. This study deals with the development of a fuzzy-based sentiment analysis by extending fuzzy hedges and rule-sets for a more efficient classification of customer sentiment and satisfaction. Prior studies have used a limited number of linguistic hedges and polarity classes in their rule-sets, resulting in the degraded efficiency of their fuzzy-based sentiment analysis systems. The proposed analysis of the current study classifies customer reviews using fuzzy linguistic hedges and an extended rule-set with seven sentiment analysis classes, namely extremely positive, very positive, positive, neutral, negative, very negative, and extremely negative. Then, a fuzzy logic system is applied to measure customer satisfaction at a fine-grained level. The experimental results demonstrate that the proposed analysis has an improved performance over the baseline works

    A Deep Network Model for Paraphrase Detection in Short Text Messages

    Full text link
    This paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is that existing paraphrase systems perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts. Challenges with paraphrase detection on user generated short texts, such as Twitter, include language irregularity and noise. To cope with these challenges, we propose a novel deep neural network-based approach that relies on coarse-grained sentence modeling using a convolutional neural network and a long short-term memory model, combined with a specific fine-grained word-level similarity matching model. Our experimental results show that the proposed approach outperforms existing state-of-the-art approaches on user-generated noisy social media data, such as Twitter texts, and achieves highly competitive performance on a cleaner corpus

    XED : A Multilingual Dataset for Sentiment Analysis and Emotion Detection

    Get PDF
    We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.Peer reviewe

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF
    corecore