Search CORE

28 research outputs found

Evolving rules for document classification

Author: A. Bergström
C. Apté
C.M. Tan
D. Montana
D.R. Tauritz
F. Sebastiani
G. Salton
H. Lodhi
J.R. Koza
K. Bennet
M. Damashek
T. Joachims
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

CiteSeerX

Crossref

Sheffield Hallam University Research Archive

UCL Discovery

Using IR techniques to improve Automated Text Classification

Author: B. Schölkopf
C. Apté
G. Salton
I. Witten
K. Nigam
P. Quaresma
T. Gonçalves
T. Gonçalves
T. Joachims
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures)

Crossref

Repositório Científico da Universidade de Évora

XML Documents Within a Legal Domain: Standards and Tools for the Italian Legislative Environment

Author: C. Apté
F. Sebastiani
T. Mitchell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Uncertainty-based Noise Reduction and Term Selection in Text Categorization

Author: A. Grove
C. Apté
J.J. Rocchio
L. D. Landau
W.W. Cohen
Publication venue
Publication date: 01/01/2002
Field of study

This paper introduces a new criterium for term selection, which is based on the notion of Uncertainty. Term selection according to this criterium is performed by the elimination of noisy terms on a class-by-class basis, rather than by selecting the most significant ones. Uncertainty-base

CiteSeerX

Crossref

Using Negation and Phrases in Inducing Rules for Text Classification

Author: C. Apté
D. E. Johnson
M. Chang
M. Hall
P. Rullo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Parallel-Sequential Texture Analysis

Author: Apté C.
Perner P.
Singh M.
Singh S.
van den Broek E.L.
van Rikxoort E.M.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2005
Field of study

Color induced texture analysis is explored, using two texture analysis techniques: the co-occurrence matrix and the color correlogram as well as color histograms. Several quantization schemes for six color spaces and the human-based 11 color quantization scheme have been applied. The VisTex texture database was used as test bed. A new color induced texture analysis approach is introduced: the parallel-sequential approach; i.e., the color correlogram combined with the color histogram. This new approach was found to be highly successful (up to 96% correct classification). Moreover, the 11 color quantization scheme performed excellent (94% correct classification) and should, therefore, be incorporated for real-time image analysis. In general, the results emphasize the importance of the use of color for texture analysis and of color as global image feature. Moreover, it illustrates the complementary character of both features

Towards Language Independent Automated Learning of Text Categorization Models

Author: C Apté
D Lewis
JR Quinlan
L Breiman
P Hayes
PJ Hayes
S Weiss
SM Weiss
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1994
Field of study

Crossref