Search CORE

8 research outputs found

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Author: Devlin Jacob
Lee Dong-Hyun
Liu Yinhan
Sorower Mohammad S
Wei Jason W
Wu Yuxiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance

Crossref

Ghent University Academic Bibliography

Robust classification of dialog acts from the transcription of utterances

Author: Sorower Mohammad S.
Yeasin Mohammed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2007
Field of study

This paper presents a robust classification of dialog acts from text utterances. Two different types, namely, bag-of-words and syntactic relationship among words, were used to extract the discourse level features from the transcript of utterances. Subsequently a number of feature mining methods have been used to identify the most relevant features and their roles in classifying dialog acts. The selected features are used to learn the underlying models of dialog acts using a number of existing machine learning algorithms from the WEKA toolbox. Empirical analyses using the HCRC Map Task Corpus dialog data was conducted to evaluate the performance of the proposed approach. © 2007 IEEE

University of Memphis Digital Commons

What Speech Tells Us About Discourse: The Role of Prosodic and Discourse Features in Speech Act Classification

Author: Hoque Mohammed E.
Louwerse Max M.
Sorower Mohammad S.
Yeasin Mohammed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2007
Field of study

This paper explores the relative importance of discourse features, prosodic features and their fusion in robust classification of speech acts. Five different feature selection algorithms were used to select set of features to improve the robustness of the classification. The results showed that the ensemble-based classifiers performed best in the classification of 12 speech acts using subsets of both prosodic and discourse features. ©2007 IEEE

University of Memphis Digital Commons

Crossref

Deciding among fake, satirical, objective and legitimate news: A multi-label classification system

Author: Bell Allan
Conroy Niall J
González-Ibánez Roberto
Kress Gunther
Mikolov Tomas
Olson David L
Pariser Eli
Ron Kohavi
Saif Hassan
Sorower Mohammad S
Tsoumakas Grigorios
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Currently, the widespread of fake news has raised on the politicalclass and society members in general, increasing concerns aboutthe potential of misinformation that can be propagated, appearingon the center of the debate about election results around the world.On the other hand, satirical news has an entertaining purpose andare mistakenly put on the same boat of objective fake news. Inthis work, we address the differences between objectivity and legitimacy of news documents, treating each article as having twoconceptual classes: objective/satirical and legitimate/fake. Thus, wepropose a Decision Support System (DSS) based on a text miningpipeline and a set of novel textual features that uses multi-labelmethods for classifying news articles on those two domains. Forvalidating the approach, a set of multi-label methods was evaluatedwith a combination of different base classifiers and then comparedto a multi-class approach. Results reported our DSS as proper (0.80F1-score) in addressing the scenario of misleading news from challenging perspective of multi-label modeling, outperforming themulti-class methods (0.71 F1-score) over a real-life news datasetcollected from several portals of news

Crossref

AIR Universita degli studi di Milano