10 research outputs found
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
We present the results and main findings of SemEval-2020 Task 12 on
Multilingual Offensive Language Identification in Social Media (OffensEval
2020). The task involves three subtasks corresponding to the hierarchical
taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The
task featured five languages: English, Arabic, Danish, Greek, and Turkish for
Subtask A. In addition, English also featured Subtasks B and C. OffensEval 2020
was one of the most popular tasks at SemEval-2020 attracting a large number of
participants across all subtasks and also across all languages. A total of 528
teams signed up to participate in the task, 145 teams submitted systems during
the evaluation period, and 70 submitted system description papers.Comment: Proceedings of the International Workshop on Semantic Evaluation
(SemEval-2020
Offensive language classification in social media: using deep learning
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsAs social media usage becomes more integrated into our daily lives, the impact of
online abuse also becomes more prevalent. Research in the area of Offensive Language
Classification are numerous and often occur in parrallel. Offensive Language Identification
Dataset (OLID) schema was introduced with the aim of consolidating related
tasks by categorising offense into a three-level hierarchy - detection of offensive posts
(Level A), distinguishing between targeted and untargeted offenses (Level B) and then
identifying the target of the offense (Level C).
This thesis presents our contribution to the Offensive Language Classification Task
(English SubTask A) of OffensEval 2020, and a follow-up study of Offense Type Classification
(subTask B) and Offense Target Identification (subTask C) of OffensEval 2019.
These tasks follow the OLID schema where each level corresponds to an individual
subtask.
For subtask A, the dataset is examined in detail and the most uncertain partitions
are removed by an under-sampling technique of the training set. We improved
model performance by increasing data quality, taking advantage of further offensive
language classification datasets. We fine-tuned separate BERT models from individual
datasets and experimented with different ensemble approaches including SVMs,
Gradient boosting, AdaBoosting and Logistic Regression to achieve a final ensemble
classification model that enhanced macro-F1 score. Our best model, an average ensemble
of four different Bert models, achieved 11th place out of 82 participants with
a macro F1 score of 0.91344 in the English SubTask A.
The dataset for subtask B and C are highly unbalanced, and modification of the classification
thresholds improved classifier performance of the minority classes, which
in turn improved the overall performance. Again using the BERT architecture, the
models achieved macro-F1 scores of 0.71367 for subTask B and 0.643352 for subTask
C, equivalent to the 5th and 2nd places in the respective tasks.
We showed that BERT is an effective architecture for offensive language classification
and propose further performance gains are possible by improving data quality.Conforme o uso da Social Media se torna mais integrado no nosso dia-a-dia, o impacto
do abuso online torna-se também mais prevalente. Pesquisas na área de Classificação
de Linguagem Ofensiva são numerosas e ocorrem frequentemente em paralelo. O
esquema Offensive Language Identification Dataset (OLID) foi introduzido com o
objectivo de consolidar tarefas relacionadas com a categorização de ofensas numa
hierarquia de três níveis - detecção de posts ofensivos (nível A), distinção entre ofensas
directas e indirectas (nível B) e posteriormente a identificação do visado pela ofensa
(nível C).
Esta tese apresenta a nossa contribuição à Offensive Language Classification Task
(English sub-tarefa A) da OffensEval 2020, e um subsequente estudo de Offense Type
Classification (sub-tarefa B) e Offense Target Identification (sub-tarefa C) da OffensEval
2019. Estas tarefas seguem o esquema OLID onde cada nível corresponde a uma
tarefa individual.
Para a sub-tarefa A, o conjunto de informação é examinado em detalhe e as partições
mais incertas são removidas por uma técnica de sub-amostragem do conjunto de
treinamento. Melhoramos também o desempenho ao melhorar a qualidade da informação,
aproveitando de conjuntos mais recentes de classificação de linguagem ofensiva.
Ajustamos modelos BERT disjuntos através de conjuntos de informação individuais
e experimentamos com diferentes junções incluindo SVMs, Gradient boosting,
AdaBoosting e Regressão Logística para alcançar /* um modelo classificação junção
final */ que melhorou a pontuação macro-F1. O nosso melhor modelo, uma junção
média de quatro modelos Bert diferentes, alcançou o 11º de 82 participantes com uma
pontuação macro de 0,91344 na sub-tarefa A de Inglês.
O conjunto de informação para a sub-tarefa B e C são altamente desequilibrados, e
modificar os limiares de classificação melhorou o desempenho de classes minoria, que
por sua vez melhoraram o desempenho no geral. Novamente usando a arquitectura
BERT, os modelos alcançaram pontuações macro-F1 de 0,71367 para a sub-tarefa B
e 0.643352 para a sub-tarefa C, equivalente ao 5º e 2º lugares nas tarefas respectivas.
Mostrámos que a arquitectura BERT é eficaz para classificação de linguagem ofensiva
e propomos que é possível ganhar desempenho através da melhoria da qualidade da
informação
Misogyny Detection in Social Media on the Twitter Platform
The thesis is devoted to the problem of misogyny detection in social media. In the work we analyse the difference between all offensive language and misogyny language in social media, and review the best existing approaches to detect offensive and misogynistic language, which are based on classical machine learning and neural networks. We also review recent shared tasks aimed to detect misogyny in social media, several of which we have participated in. We propose an approach to the detection and classification of misogyny in texts, based on the construction of an ensemble of models of classical machine learning: Logistic Regression, Naive Bayes, Support Vectors Machines. Also, at the preprocessing stage we used some linguistic features, and novel approaches which allow us to improve the quality of classification. We tested the model on the real datasets both English and multilingual corpora. The results we achieved with our model are highly competitive in this area and demonstrate the capability for future improvement
An Empirical Study of Offensive Language in Online Interactions
In the past decade, usage of social media platforms has increased significantly. People use these platforms to connect with friends and family, share information, news and opinions. Platforms such as Facebook, Twitter are often used to propagate offensive and hateful content online. The open nature and anonymity of the internet fuels aggressive and inflamed conversations. The companies and federal institutions are striving to make social media cleaner, welcoming and unbiased. In this study, we first explore the underlying topics in popular offensive language datasets using statistical and neural topic modeling. The current state-of-the-art models for aggression detection only present a toxicity score based on the entire post. Content moderators often have to deal with lengthy texts without any word-level indicators. We propose a neural transformer approach for detecting the tokens that make a particular post aggressive. The pre-trained BERT model has achieved state-of-the-art results in various natural language processing tasks. However, the model is trained on general-purpose corpora and lacks aggressive social media linguistic features. We propose fBERT, a retrained BERT model with over million offensive tweets from the SOLID dataset. We demonstrate the effectiveness and portability of fBERT over BERT in various shared offensive language detection tasks. We further propose a new multi-task aggression detection (MAD) framework for post and token-level aggression detection using neural transformers. The experiments confirm the effectiveness of the multi-task learning model over individual models; particularly when the number of training data is limited
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)
Challenging Social Media Threats using Collective Well-being Aware Recommendation Algorithms and an Educational Virtual Companion
Social media (SM) have become an integral part of our lives, expanding our
inter-linking capabilities to new levels. There is plenty to be said about
their positive effects. On the other hand however, some serious negative
implications of SM have repeatedly been highlighted in recent years, pointing
at various SM threats for society, and its teenagers in particular: from common
issues (e.g. digital addiction and polarization) and manipulative influences of
algorithms to teenager-specific issues (e.g. body stereotyping). The full
impact of current SM platform design -- both at an individual and societal
level -- asks for a comprehensive evaluation and conceptual improvement. We
extend measures of Collective Well-Being (CWB) to SM communities. As users'
relationships and interactions are a central component of CWB, education is
crucial to improve CWB. We thus propose a framework based on an adaptive
"social media virtual companion" for educating and supporting the entire
students' community to interact with SM. The virtual companion will be powered
by a Recommender System (CWB-RS) that will optimize a CWB metric instead of
engagement or platform profit, which currently largely drives recommender
systems thereby disregarding any societal collateral effect. CWB-RS will
optimize CWB both in the short term, by balancing the level of SM threat the
students are exposed to, as well as in the long term, by adopting an
Intelligent Tutor System role and enabling adaptive and personalized sequencing
of playful learning activities. This framework offers an initial step on
understanding how to design SM systems and embedded educational interventions
that favor a more healthy and positive society
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
Tune your brown clustering, please
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal