756 research outputs found
Aggressive language identification using word embeddings and sentiment features
This paper describes our participation in the First Shared Task on Aggression Identification. The
method proposed relies on machine learning to identify social media texts which contain aggression.
The main features employed by our method are information extracted from word embeddings
and the output of a sentiment analyser. Several machine learning methods and different
combinations of features were tried. The official submissions used Support Vector Machines and
Random Forests. The official evaluation showed that for texts similar to the ones in the training
dataset Random Forests work best, whilst for texts which are different SVMs are a better choice.
The evaluation also showed that despite its simplicity the method performs well when compared
with more elaborated methods
Do Multi-Sense Embeddings Improve Natural Language Understanding?
Learning a distinct representation for each sense of an ambiguous word could
lead to more powerful and fine-grained models of vector-space representations.
Yet while `multi-sense' methods have been proposed and tested on artificial
word-similarity tasks, we don't know if they improve real natural language
understanding tasks. In this paper we introduce a multi-sense embedding model
based on Chinese Restaurant Processes that achieves state of the art
performance on matching human word similarity judgments, and propose a
pipelined architecture for incorporating multi-sense embeddings into language
understanding.
We then test the performance of our model on part-of-speech tagging, named
entity recognition, sentiment analysis, semantic relation identification and
semantic relatedness, controlling for embedding dimensionality. We find that
multi-sense embeddings do improve performance on some tasks (part-of-speech
tagging, semantic relation identification, semantic relatedness) but not on
others (named entity recognition, various forms of sentiment analysis). We
discuss how these differences may be caused by the different role of word sense
information in each of the tasks. The results highlight the importance of
testing embedding models in real applications
PoliTeam @ AMI: Improving Sentence Embedding Similaritywith Misogyny Lexicons for Automatic Misogyny Identificationin Italian Tweets
en We present a multi-agent classification solution for identifying misogynous and aggressive content in Italian tweets. A first agent uses modern Sentence Embedding techniques to encode tweets and a SVM classifier to produce initial labels. A second agent, based on TF-IDF and Misogyny Italian lexicons, is jointly adopted to improve the first agent on uncertain predictions. We evaluate our approach in the Automatic Misogyny Identification Shared Task of the EVALITA 2020 campaign. Results show that TF-IDF and lexicons effectively improve the supervised agent trained on sentence embeddings.Presentiamo un classificatore multi-agente per identificare tweet italiani misogini e aggressivi. Un primo agente codifica i tweet con Sentence Embedding e una SVM per produrre le etichette iniziali. Un secondo agente, basato su TF-IDF e lessici misogini, è usato per coadiuvare il primo agente nelle predizioni incerte. Applichiamo la soluzione al task AMI della campagna EVALITA 2020. I risultati mostrano che TF-IDF e i lessici migliorano le performance del primo agente addestrato su sentence embedding
A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts
Wide usage of social media platforms has increased the risk of aggression,
which results in mental stress and affects the lives of people negatively like
psychological agony, fighting behavior, and disrespect to others. Majority of
such conversations contains code-mixed languages[28]. Additionally, the way
used to express thought or communication style also changes from one social
media plat-form to another platform (e.g., communication styles are different
in twitter and Facebook). These all have increased the complexity of the
problem. To solve these problems, we have introduced a unified and robust
multi-modal deep learning architecture which works for English code-mixed
dataset and uni-lingual English dataset both.The devised system, uses
psycho-linguistic features and very ba-sic linguistic features. Our multi-modal
deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and
Disconnected RNN(with Glove and FastText embedding, both). Finally, the system
takes the decision based on model averaging. We evaluated our system on English
Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from
Kaggle. Experimental results show that our proposed system outperforms all the
previous approaches on English code-mixed dataset and uni-lingual English
dataset.Comment: 10 pages, 5 Figures, 6 Tables, accepted at CoDS-COMAD 202
Recommended from our members
Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media
Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (e.g., hurricane, floods, etc.) and information categories (e.g., reports on affected individuals, donations and volunteering, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNNs) and validated against traditional machine learning models. Results show that the CNN-based API results can be relied upon when dealing with specific crises with the benefits associated with the usage word embeddings
- …