475 research outputs found

    Detection of Sarcasm and Nastiness: New Resources for Spanish Language

    Get PDF
    The main goal of this work is to provide the cognitive computing community with valuable resources to analyze and simulate the intentionality and/or emotions embedded in the language employed in social media. Specifically, it is focused on the Spanish language and online dialogues, leading to the creation of SOFOCO (Spanish Online Forums Corpus). It is the first Spanish corpus consisting of dialogic debates extracted from social media and it is annotated by means of crowdsourcing in order to carry out automatic analysis of subjective language forms, like sarcasm or nastiness. Furthermore, the annotators were also asked about the context need when taking a decision. In this way, the users’ intentions and their behavior inside social networks can be better understood and more accurate text analysis is possible. An analysis of the annotation results is carried out and the reliability of the annotations is also explored. Additionally, sarcasm and nastiness detection results (around 0.76 F-Measure in both cases) are also reported. The obtained results show the presented corpus as a valuable resource that might be used in very diverse future work.This study was partially funded by the Spanish Government (TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R) by the European Unions’s H2020 program under grant 769872 and by the National Science Foundation of USA (NSF CISE R1 #1202668

    Challenges of Sarcasm Detection for Social Network : A Literature Review

    Get PDF
    Nowadays, sarcasm recognition and detection simplified with various domains knowledge, among others, computer science, social science, psychology, mathematics, and many more. This article aims to explain trends in sentiment analysis especially sarcasm detection in the last ten years and its direction in the future. We review journals with the title’s keyword “sarcasm” and published from the year 2008 until 2018. The articles were classified based on the most frequently discussed topics among others: the dataset, pre-processing, annotations, approaches, features, context, and methods used. The significant increase in the number of articles on “sarcasm” in recent years indicates that research in this area still has enormous opportunities. The research about “sarcasm” also became very interesting because only a few researchers offer solutions for unstructured language. Some hybrid approaches using classification and feature extraction are used to identify the sarcasm sentence using deep learning models. This article will provide a further explanation of the most widely used algorithms for sarcasm detection with object social media. At the end of this article also shown that the critical aspect of research on sarcasm sentence that could be done in the future is dataset usage with various languages that cover unstructured data problem with contextual information will effectively detect sarcasm sentence and will improve the existing performance

    HindiPersonalityNet: Personality Detection in Hindi Conversational Data using Deep Learning with Static Embedding

    Get PDF
    Personality detection along with other behavioural and cognitive assessment can essentially explain why people act the way they do and can be useful to various online applications such as recommender systems, job screening, matchmaking, and counselling. Additionally, psychometric NLP relying on textual cues and distinctive markers in writing style within conversational utterances reveal signs of individual personalities. This work demonstrates a text-based deep neural model, HindiPersonalityNet of classifying conversations into three personality categories {ambivert, extrovert, introvert} for detecting personality in Hindi conversational data. The model utilizes GRU with BioWordVec embeddings for text classification and is trained/tested on a novel dataset, शख्सियत (pronounced as Shakhsiyat) curated using dialogues from an Indian crime-thriller drama series, Aarya. The model achieves an F1-score of 0.701 and shows the potential for leveraging conversational data from various sources to understand and predict a person's personality traits. It exhibits the ability to capture semantic as well as long-distance dependencies in conversations and establishes the effectiveness of our dataset as a benchmark for personality detection in Hindi dialogue data. Further, a comprehensive comparison of various static and dynamic word embedding is done on our standardized dataset to ascertain the most suitable embedding method for personality detection

    BLM-17m: A Large-Scale Dataset for Black Lives Matter Topic Detection on Twitter

    Full text link
    Protection of human rights is one of the most important problems of our world. In this paper, our aim is to provide a dataset which covers one of the most significant human rights contradiction in recent months affected the whole world, George Floyd incident. We propose a labeled dataset for topic detection that contains 17 million tweets. These Tweets are collected from 25 May 2020 to 21 August 2020 that covers 89 days from start of this incident. We labeled the dataset by monitoring most trending news topics from global and local newspapers. Apart from that, we present two baselines, TF-IDF and LDA. We evaluated the results of these two methods with three different k values for metrics of precision, recall and f1-score. The collected dataset is available at https://github.com/MeysamAsgariC/BLMT

    Deep Emotion Recognition in Textual Conversations: A Survey

    Full text link
    While Emotion Recognition in Conversations (ERC) has seen a tremendous advancement in the last few years, new applications and implementation scenarios present novel challenges and opportunities. These range from leveraging the conversational context, speaker and emotion dynamics modelling, to interpreting common sense expressions, informal language and sarcasm, addressing challenges of real time ERC, recognizing emotion causes, different taxonomies across datasets, multilingual ERC to interpretability. This survey starts by introducing ERC, elaborating on the challenges and opportunities pertaining to this task. It proceeds with a description of the emotion taxonomies and a variety of ERC benchmark datasets employing such taxonomies. This is followed by descriptions of the most prominent works in ERC with explanations of the Deep Learning architectures employed. Then, it provides advisable ERC practices towards better frameworks, elaborating on methods to deal with subjectivity in annotations and modelling and methods to deal with the typically unbalanced ERC datasets. Finally, it presents systematic review tables comparing several works regarding the methods used and their performance. The survey highlights the advantage of leveraging techniques to address unbalanced data, the exploration of mixed emotions and the benefits of incorporating annotation subjectivity in the learning phase

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    corecore