Search CORE

5 research outputs found

A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts

Author: Aroyehun Segun Taofeek
Arroyo-Fernández Ignacio
Fortuna Paula
François Chollet
Galery Thiago
Golem Viktor
Hutto Clayton J
Kingma Diederik P
Kumar Ritesh
Kumar Ritesh
Kumar Ritesh
Kumar Ritesh
Madisetty Sreekanth
Majumder Prasenjit
Mikolov Tomas
Orabi Ahmed Husseini
Orasan Constantin
Pawlikowski Maciej
Ramiandrisoa Faneva
Risch Julian
Tieleman Tijmen
Tommasel Antonela
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/01/2020
Field of study

Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.Comment: 10 pages, 5 Figures, 6 Tables, accepted at CoDS-COMAD 202

arXiv.org e-Print Archive

Explaining and Applying Graph Neural Networks on Text

Author: Grünefeld Nils
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2022
Field of study

Text classification is an essential task in natural language processing. While graph neural networks (GNNs) have successfully been applied to this problem both through graph classification and node classification approaches, their typical applications suffer from several issues. In the graph classification case, common graph construction techniques tend to leave out syntactic information. In the node classification case, most widespread datasets and applications tend to suffer from encoding relatively little information in the chosen node features. Finally, there are great benefits to be gained from combining the two GNN approaches. To tackle these concerns, we propose DepNet, a two-stage framework for text classification using GNN models. In the first stage we replace current graph construction methods by utilizing syntactic dependency parsing in order to include as much syntactic information in the GNN input as possible. In the second stage we combine both graph classification and node classification methods by utilizing the former to produce node embeddings for the latter, maximizing the potential of GNNs for text classification. We find that this technique significantly improves the performance of both graph classification and node classification approaches to text classification

Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification

Author: Gong Q
He L
Li B
Li J
Peng H
Wang L
Wang S
Yang R
Yu P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2021
Field of study

CNNs, RNNs, GCNs, and CapsNets have shown significant insights in representation learning and are widely used in various text mining tasks such as large-scale multi-label text classification. Most existing deep models for multi-label text classification consider either non-consecutive and long-distance semantics or sequential semantics. However, how to coherently take them into account is still far from studied. In addition, most existing methods treat output labels as independent medoids, ignoring the hierarchical relationships, which leads to a substantial loss of useful semantic information. In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification. Specifically, we first propose to model each document as a word order preserved graph-of-words and normalize it as a corresponding word matrix preserving both non-consecutive, long-distance and local sequential semantics. Then the word matrix is input to the proposed attentional graph capsule recurrent CNNs for effectively learning the semantic features. To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method, and define a novel weighted margin loss by incorporating the label representation similarity. Extensive evaluations on three datasets show that our model significantly improves the performance by comparing with state-of-the-art approaches