Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization

Collada Pérez, Sonia; González Cristóbal, José Carlos; Lana Serrano, Sara; Villena Román, Julio

research

Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization

Authors: Sonia Collada Pérez
José Carlos González Cristóbal
Sara Lana Serrano
Julio Villena Román
Publication date: 1 January 2011
Publisher: E.U.I.T. Telecomunicación (UPM)

Abstract

This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to trai

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Archivo Digital UPM

oai:oa.upm.es:13310

Last time updated on 17/07/2013

Servicio de Coordinación de Bibliotecas de la Universidad Politécnica de Madrid

oai:oa.upm.es:13310

Last time updated on 10/02/2018