8,208 research outputs found
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
Multilingual Cross-domain Perspectives on Online Hate Speech
In this report, we present a study of eight corpora of online hate speech, by
demonstrating the NLP techniques that we used to collect and analyze the
jihadist, extremist, racist, and sexist content. Analysis of the multilingual
corpora shows that the different contexts share certain characteristics in
their hateful rhetoric. To expose the main features, we have focused on text
classification, text profiling, keyword and collocation extraction, along with
manual annotation and qualitative study.Comment: 24 page
Deep recommender engine based on efficient product embeddings neural pipeline
Predictive analytics systems are currently one of the most important areas of
research and development within the Artificial Intelligence domain and
particularly in Machine Learning. One of the "holy grails" of predictive
analytics is the research and development of the "perfect" recommendation
system. In our paper, we propose an advanced pipeline model for the multi-task
objective of determining product complementarity, similarity and sales
prediction using deep neural models applied to big-data sequential transaction
systems. Our highly parallelized hybrid model pipeline consists of both
unsupervised and supervised models, used for the objectives of generating
semantic product embeddings and predicting sales, respectively. Our
experimentation and benchmarking processes have been done using pharma industry
retail real-life transactional Big-Data streams.Comment: 2018 17th RoEduNet Conference: Networking in Education and Research
(RoEduNet
Detecting Real-World Influence Through Twitter
In this paper, we investigate the issue of detecting the real-life influence
of people based on their Twitter account. We propose an overview of common
Twitter features used to characterize such accounts and their activity, and
show that these are inefficient in this context. In particular, retweets and
followers numbers, and Klout score are not relevant to our analysis. We thus
propose several Machine Learning approaches based on Natural Language
Processing and Social Network Analysis to label Twitter users as Influencers or
not. We also rank them according to a predicted influence level. Our proposals
are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art
ranking methods.Comment: 2nd European Network Intelligence Conference (ENIC), Sep 2015,
Karlskrona, Swede
- …