48,739 research outputs found

    Evolving text classification rules with genetic programming

    Get PDF
    We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

    Evolving rules for document classification

    Get PDF
    We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

    Evolving Lucene search queries for text classification

    Get PDF
    We describe a method for generating accurate, compact, human understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct Lucene search queries. Genetic programs acquire fitness by producing queries that are effective binary classifiers for a particular category when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from classification tasks

    Low Size-Complexity Inductive Logic Programming: The East-West Challenge Considered as a Problem in Cost-Sensitive Classification

    Get PDF
    The Inductive Logic Programming community has considered proof-complexity and model-complexity, but, until recently, size-complexity has received little attention. Recently a challenge was issued "to the international computing community" to discover low size-complexity Prolog programs for classifying trains. The challenge was based on a problem first proposed by Ryszard Michalski, 20 years ago. We interpreted the challenge as a problem in cost-sensitive classification and we applied a recently developed cost-sensitive classifier to the competition. Our algorithm was relatively successful (we won a prize). This paper presents our algorithm and analyzes the results of the competition
    corecore