Text classification method review

Mahinovs, Aigars; Tiwari, Ashutosh; Roy, Rajkumar (series editor); Baxter, David (series editor)

research

oai:dspace.lib.cranfield.ac.uk:1826/1860

Text classification method review

Authors: Aigars Mahinovs
Ashutosh Tiwari
Rajkumar (series editor) Roy
David (series editor) Baxter
Publication date: 1 April 2007
Publisher

Abstract

With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. With this growth of information and simultaneous growth of available computing power automatic classification of data, particularly textual data, gains increasingly high importance. This paper provides a review of generic text classification process, phases of that process and methods being used at each phase. Examples from web page classification and spam classification are provided throughout the text. Principles of operation of four main text classification engines are described – Naïve Bayesian, k Nearest Neighbours, Support Vector Machines and Perceptron Neural Networks. This paper will look through the state of the art in all these phases, take note of methods and algorithms used and of different ways that researchers are trying to reduce computational complexity and improve the precision of text classification process as well as how the text classification is used in practice. The paper is written in a way to avoid extensive use of mathematical formulae in order to be more suited for readers with little or no background in theoretical mathemat

Similar works

Full text

Open in the Core reader

Download PDF

Cranfield CERES

oai:dspace.lib.cranfield.ac.uk...

Last time updated on 07/02/2012

This paper was published in Cranfield CERES.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.