Search CORE

3 research outputs found

Evolving text classification rules with genetic programming

Author: Anthony N.
Ebert D.
Hirsch L.
Joachims T.
Karanikas H.
Koza J. R.
Koza J. R.
Langdon W.B.
Laurence Hirsch
Lodhi H.
Masoud Saeedi
Montana D
Robin Hirsch
Salton G.
Van Rijsbergen C. J.
Publication venue: 'Informa UK Limited'
Publication date: 07/09/2005
Field of study

We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

Crossref

Sheffield Hallam University Research Archive

Evolving Lucene search queries for text classification

Author: Hirsch Laurence
Hirsch R
Saeedi M
Publication venue
Publication date: 01/01/2007
Field of study

We describe a method for generating accurate, compact, human understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct Lucene search queries. Genetic programs acquire fitness by producing queries that are effective binary classifiers for a particular category when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from classification tasks

CiteSeerX

Crossref

Sheffield Hallam University Research Archive

An analysis of search query evolution in document classification and clustering

Author: Haddela Kankanamalage Prasanna Sumathipala
Publication venue
Publication date
Field of study

With the increasing use of data analytics in decision-making processes today, the analysis of document collections for various purposes has become a widely accepted area of research. Document classification and clustering are two intensely investigated and active areas of research due to the complex nature of the problem and its impact on society. However, many of the popular methods developed to classify and cluster documents with high accuracy lack explanation to end users, which affects the trustworthiness of certain applications among them. Therefore, it is crucial to improve explainable classification and clustering methods. One approach that has shown promise in this regard is the evolved search query (eSQ), a genetic algorithm (GA)-based approach for classification and clustering. GA-based methods excel at finding highly optimized solutions for complex problems, and eSQ has utilized this capability to develop classification and clustering methods that are also human interpretable. The primary focus of this study is to analyse the eSQ approach to document classification and clustering with an emphasis on explainability. The investigation covers three perspectives of the eSQ-based methods: explainability, document classification, and document clustering. This thesis presents a taxonomy for classification based on human friendliness, empirical observations on the performance of eSQ classifiers using different feature selection methods, the effectiveness of eSQ classifiers for Sinhala documents, and the performance of eSQ clustering for Sinhala documents. The research contributes significantly by categorizing popular classification methods using the new taxonomy, integrating feature selection methods into eSQ classifiers, enhancing Apache Lucene by incorporating the Sinhala language with basic pre-processing tools, and improving eSQ hybrid single word clustering methods. Notably, the eSQ-based classification and clustering methods demonstrate superior performance when document categories overlap

Sheffield Hallam University Research Archive