3,057 research outputs found
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
Level Playing Field for Million Scale Face Recognition
Face recognition has the perception of a solved problem, however when tested
at the million-scale exhibits dramatic variation in accuracies across the
different algorithms. Are the algorithms very different? Is access to good/big
training data their secret weapon? Where should face recognition improve? To
address those questions, we created a benchmark, MF2, that requires all
algorithms to be trained on same data, and tested at the million scale. MF2 is
a public large-scale set with 672K identities and 4.7M photos created with the
goal to level playing field for large scale face recognition. We contrast our
results with findings from the other two large-scale benchmarks MegaFace
Challenge and MS-Celebs-1M where groups were allowed to train on any
private/public/big/small set. Some key discoveries: 1) algorithms, trained on
MF2, were able to achieve state of the art and comparable results to algorithms
trained on massive private sets, 2) some outperformed themselves once trained
on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace,
identifying the need for larger age variations possibly within identities or
adjustment of algorithms in future testings
Explainable Misinformation Detection Across Multiple Social Media Platforms
In this work, the integration of two machine learning approaches, namely
domain adaptation and explainable AI, is proposed to address these two issues
of generalized detection and explainability. Firstly the Domain Adversarial
Neural Network (DANN) develops a generalized misinformation detector across
multiple social media platforms DANN is employed to generate the classification
results for test domains with relevant but unseen data. The DANN-based model, a
traditional black-box model, cannot justify its outcome, i.e., the labels for
the target domain. Hence a Local Interpretable Model-Agnostic Explanations
(LIME) explainable AI model is applied to explain the outcome of the DANN mode.
To demonstrate these two approaches and their integration for effective
explainable generalized detection, COVID-19 misinformation is considered a case
study. We experimented with two datasets, namely CoAID and MiSoVac, and
compared results with and without DANN implementation. DANN significantly
improves the accuracy measure F1 classification score and increases the
accuracy and AUC performance. The results obtained show that the proposed
framework performs well in the case of domain shift and can learn
domain-invariant features while explaining the target labels with LIME
implementation enabling trustworthy information processing and extraction to
combat misinformation effectively.Comment: 28 pages,4 figure
- …