Search CORE

7 research outputs found

Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages

Author: Das Amitava
Gambäck Björn
Jamatia Anupam
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

The paper reports work on collecting and annotating code-mixed English-Hindi so- cial media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-grained part-of- speech tag set. We compare the perfor- mance of a combination of language spe- cific taggers to that of applying four ma- chine learning algorithms to the task (Con- ditional Random Fields, Sequential Mini- mal Optimization, Naïve Bayes and Ran- dom Forests), using a range of different features based on word context and word- internal informatio

CiteSeerX

NORA - Norwegian Open Research Archives

Studying Generalisability across Abusive Language Detection Datasets

Author: Gambäck Björn
Jamatia Anupam
Swamy Steve Durairaj
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts

NORA - Norwegian Open Research Archives

NIT_Agartala_NLP_Team at SemEval-2019 Task 6: An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media Corpora

Author: Das Amitava
Gambäck Björn
Jamatia Anupam
Swamy Steve Durairaj
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

The paper describes the systems submitted to OffensEval (SemEval 2019, Task 6) on ‘Identifying and Categorizing Offensive Language in Social Media’ by the ‘NIT_Agartala_NLP_Team’. A Twitter annotated dataset of 13,240 English tweets was provided by the task organizers to train the individual models, with the best results obtained using an ensemble model composed of six different classifiers. The ensemble model produced macro-averaged F1-scores of 0.7434, 0.7078 and 0.4853 on Subtasks A, B, and C, respectively. The paper highlights the overall low predictive nature of various linguistic features and surface level count features, as well as the limitations of a traditional machine learning approach when compared to a Deep Learning counterpart

Crossref

NORA - Norwegian Open Research Archives

Sentence Boundary Detection for Social Media Text

Author: Chakma Kunal
Das Amitava
Gambäck Björn
Jamatia Anupam
Rudrapal Dwijen
Publication venue: International Institute of Information Technology Trivandrum, India
Publication date: 01/01/2015
Field of study

The paper presents a study on automatic sentence boundary detection in social me-dia texts such as Facebook messages and Twitter micro-blogs (tweets). We explore the limitations of using existing rule-based sentence boundary detection systems on social media text, and as an alternative in-vestigate applying three machine learning algorithms (Conditional Random Fields, Naïve Bayes, and Sequential Minimal Op-timization) to the task. The systems were tested on three corpora annotated with sentence boundaries, one containing more formal English text, one consisting of tweets and Facebook posts in English, and one with tweets in code-mixed English-Hindi. The results show that Naïve Bayes and Sequential Minimal Optimization were clearly more successful than the other approaches.

CiteSeerX

NORA - Norwegian Open Research Archives

Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus

Author: Amitava Das
Androutsopoulos J.
Anupam Jamatia
Barman U.
Barman U.
Bhattacharja S.
Björn Gambäck
Bohra A.
Das A.
Das A.
Duran L.
Eisenstein J.
Gambäck B.
Graves A.
Gupta P.
Ioffe S.
Joshi A.
Joshi A. K.
Kumar A.
Kumar U.
Lal Y. K.
M. M.
Ma Y.
Muysken P.
Nakov P.
Patra B. G.
Pennington J.
Peters M.
Rao P. R. K.
Sequiera R.
Socher R.
Solorio T.
Sotillo S.
Steve Durairaj Swamy
Sukhbaatar S.
Swapan Debbarma
Xochitiotzi Zarate A. L.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref