8 research outputs found

    Data augmentation and semi-supervised learning for deep neural networks-based text classifier

    Get PDF
    User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance

    Robust classification of dialog acts from the transcription of utterances

    No full text
    This paper presents a robust classification of dialog acts from text utterances. Two different types, namely, bag-of-words and syntactic relationship among words, were used to extract the discourse level features from the transcript of utterances. Subsequently a number of feature mining methods have been used to identify the most relevant features and their roles in classifying dialog acts. The selected features are used to learn the underlying models of dialog acts using a number of existing machine learning algorithms from the WEKA toolbox. Empirical analyses using the HCRC Map Task Corpus dialog data was conducted to evaluate the performance of the proposed approach. © 2007 IEEE

    What Speech Tells Us About Discourse: The Role of Prosodic and Discourse Features in Speech Act Classification

    No full text
    This paper explores the relative importance of discourse features, prosodic features and their fusion in robust classification of speech acts. Five different feature selection algorithms were used to select set of features to improve the robustness of the classification. The results showed that the ensemble-based classifiers performed best in the classification of 12 speech acts using subsets of both prosodic and discourse features. ©2007 IEEE

    Deciding among fake, satirical, objective and legitimate news: A multi-label classification system

    No full text
    Currently, the widespread of fake news has raised on the politicalclass and society members in general, increasing concerns aboutthe potential of misinformation that can be propagated, appearingon the center of the debate about election results around the world.On the other hand, satirical news has an entertaining purpose andare mistakenly put on the same boat of objective fake news. Inthis work, we address the differences between objectivity and legitimacy of news documents, treating each article as having twoconceptual classes: objective/satirical and legitimate/fake. Thus, wepropose a Decision Support System (DSS) based on a text miningpipeline and a set of novel textual features that uses multi-labelmethods for classifying news articles on those two domains. Forvalidating the approach, a set of multi-label methods was evaluatedwith a combination of different base classifiers and then comparedto a multi-class approach. Results reported our DSS as proper (0.80F1-score) in addressing the scenario of misleading news from challenging perspective of multi-label modeling, outperforming themulti-class methods (0.71 F1-score) over a real-life news datasetcollected from several portals of news
    corecore