67 research outputs found

    UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu

    Full text link
    This study reports the second shared task named as UrduFake@FIRE2021 on identifying fake news detection in Urdu language. This is a binary classification problem in which the task is to classify a given news article into two classes: (i) real news, or (ii) fake news. In this shared task, 34 teams from 7 different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE) registered to participate in the shared task, 18 teams submitted their experimental results and 11 teams submitted their technical reports. The proposed systems were based on various count-based features and used different classifiers as well as neural network architectures. The stochastic gradient descent (SGD) algorithm outperformed other classifiers and achieved 0.679 F-score

    PoliTo at MULTI-Fake-DetectiVE: Improving FND-CLIP for Multimodal Italian Fake News Detection

    Get PDF
    The MULTI-Fake-DetectiVE challenge addresses the automatic detection of Italian fake news in a multimodal setting, where both textual and visual components contribute as potential sources of fake content. This paper describes the PoliTO approach to the tasks of fake news detection and analysis of the modality contributions. Our solution turns out to be the best performer on both tasks. It leverages the established FND-CLIP multimodal architecture and proposes ad hoc extensions including sentiment-based text encoding, image transformation in the frequency domain, and data augmentation via back-translation. Thanks to its effectiveness in combining visual and textual content, our solution contributes to fighting the spread of disinformation in the Italian news flow

    UATTA-EB: Uncertainty-Aware Test-Time Augmented Ensemble of BERTs for Classifying Common Mental Illnesses on Social Media Posts

    Full text link
    Given the current state of the world, because of existing situations around the world, millions of people suffering from mental illnesses feel isolated and unable to receive help in person. Psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. People have increasingly turned to online platforms to express themselves and seek help with their conditions. Deep learning methods have been commonly used to identify and analyze mental health conditions from various sources of information, including social media. Still, they face challenges, including a lack of reliability and overconfidence in predictions resulting in the poor calibration of the models. To solve these issues, We propose UATTA-EB: Uncertainty-Aware Test-Time Augmented Ensembling of BERTs for producing reliable and well-calibrated predictions to classify six possible types of mental illnesses- None, Depression, Anxiety, Bipolar Disorder, ADHD, and PTSD by analyzing unstructured user data on Reddit.Comment: Accepted at Tiny Papers @ ICLR 202

    A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends

    Get PDF
    Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy

    Transfer Learning for Low-Resource Sentiment Analysis

    Full text link
    Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F1_1 score and accuracy despite the difficulty of the task.Comment: 14 pages - under review at ACM TALLI

    Focused Crawling and Model Evaluation in the field of Conversational Agents and Motivational Interviewing

    Get PDF
    The exploitation of Motivational Interviewing concepts when analysing individuals’ speech contributes to gaining valuable insights into their perspectives and attitudes towards behaviour change. The scarcity of labelled user data poses a persistent challenge and impedes technical advancements in research in non-English language scenarios. To address the limitations of manual data labelling, we propose a semisupervised learning method as a means to augment an existing training corpus. Our approach leverages machine-translated user-generated data sourced from social media communities and employs self-training techniques for annotation. We conduct an evaluation of multiple classifiers trained on various augmented datasets. To that end, we consider diverse source contexts and employ different effectiveness metrics. The results indicate that this weak labelling approach does not yield significant improvements in the overall classification capabilities of the models. However, notable enhancements were observed for the minority classes. As part of future work, we propose to enlarge the datasets only with new examples from the minority classes. We conclude that several factors, including the quality of machine translation, can potentially bias the pseudo-labelling models. The imbalanced nature of the data and the impact of a strict pre-filtering threshold are other important aspects that need to be taken into account.Universidade de Santiago de Compostela. Escola Técnica Superior de Enxeñarí

    Explainable Misinformation Detection Across Multiple Social Media Platforms

    Full text link
    In this work, the integration of two machine learning approaches, namely domain adaptation and explainable AI, is proposed to address these two issues of generalized detection and explainability. Firstly the Domain Adversarial Neural Network (DANN) develops a generalized misinformation detector across multiple social media platforms DANN is employed to generate the classification results for test domains with relevant but unseen data. The DANN-based model, a traditional black-box model, cannot justify its outcome, i.e., the labels for the target domain. Hence a Local Interpretable Model-Agnostic Explanations (LIME) explainable AI model is applied to explain the outcome of the DANN mode. To demonstrate these two approaches and their integration for effective explainable generalized detection, COVID-19 misinformation is considered a case study. We experimented with two datasets, namely CoAID and MiSoVac, and compared results with and without DANN implementation. DANN significantly improves the accuracy measure F1 classification score and increases the accuracy and AUC performance. The results obtained show that the proposed framework performs well in the case of domain shift and can learn domain-invariant features while explaining the target labels with LIME implementation enabling trustworthy information processing and extraction to combat misinformation effectively.Comment: 28 pages,4 figure
    • …
    corecore