3,589 research outputs found

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Topic Classification for Short Texts

    Get PDF
    In the context of TV and social media surveillance, constructing models to automate topic identification of short texts is key task. This paper formalizes the topic classification as a top-K multinomial classification problem and constructs worth-to-consider models for practical usage. We describe the full data processing pipeline, discussing about dataset selection, text preprocessing, feature extraction, model selection and learning, including hyperparameter optimization. When computing time and resources are limited, we show that a classical model like SVM performs as well as an advanced deep neural network, but with shorter model training time

    Two Text Classifiers in Online Discussion: Support Vector Machine vs Back-Propagation Neural Network

    Get PDF
    The purpose of this research is to compare the performance of two text classifiers; support vector machine (SVM) and back-propagation neural network (BPNN) within categorize messages from an online discussion. SVM has been recognized as one of the best algorithm for text categorization. BPNN is also a popular categorization method that can handle linear and non linear problems and can achieve good result. However, using SVM and BPNN in online discussion is rare. In this research, several SVM data are trained in multi-class categorization to classify the same set with BPNN. The effectiveness of these two text classifiers are measured and then statistically compared based on error rate, precision, recall and F-measure. The experimental result shows that for text message categorization in online discussion, the performances of SVM outperform BPNN in term of error rate and precision; and falls behind BPNN in term of recall and F-measure

    Classifiers and text mining: application to a specific context

    Get PDF
    [Abstract]: The constant growth of social networks has not only brought us new ways of interacting with each other, but has also given way to a severe increase in negative behaviors: hate speech, racism, gender harassment, cyberbullying, etc. Manually trying to detect this kind of behaviours in millions of daily social media posts is out of the question. The solution lies in developing intelligent systems to automate such detection tasks. As the nature of these texts is completely subjective, this problem falls under the field of sentiment analysis, which aims to systematically identify and study affective states and subjective information in textual data using natural language processing techniques. In particular, this project is focused on the research of different machine learning techniques related to natural language processing, in order to automate and perform a reliable detection and classification of sexist-related behaviours in social media texts. We will tackle the task of adequately processing the extracted data from social media, as well as researching various text classification techniques and models that we will use to develop and evaluate a variety of classifiers.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2021/202

    Aspect-Based Sentiment Analysis using Machine Learning and Deep Learning Approaches

    Get PDF
    Sentiment analysis (SA) is also known as opinion mining, it is the process of gathering and analyzing people's opinions about a particular service, good, or company on websites like Twitter, Facebook, Instagram, LinkedIn, and blogs, among other places. This article covers a thorough analysis of SA and its levels. This manuscript's main focus is on aspect-based SA, which helps manufacturing organizations make better decisions by examining consumers' viewpoints and opinions of their products. The many approaches and methods used in aspect-based sentiment analysis are covered in this review study (ABSA). The features associated with the aspects were manually drawn out in traditional methods, which made it a time-consuming and error-prone operation. Nevertheless, these restrictions may be overcome as artificial intelligence develops. Therefore, to increase the effectiveness of ABSA, researchers are increasingly using AI-based machine learning (ML) and deep learning (DL) techniques. Additionally, certain recently released ABSA approaches based on ML and DL are examined, contrasted, and based on this research, gaps in both methodologies are discovered. At the conclusion of this study, the difficulties that current ABSA models encounter are also emphasized, along with suggestions that can be made to improve the efficacy and precision of ABSA systems
    • …
    corecore