Skip to main content
Article thumbnail
Location of Repository

Discovering interesting knowledge from a science & technology database with a genetic algorithm

By Wesley Romao, Alex A. Freitas and Itana M. de S. Gimenes

Abstract

Data mining consists of extracting interesting knowledge from data. This paper addresses the discovery of knowledge in the form of prediction IF-THEN rules, which are a popular form of knowledge representation in data mining. In this context, we propose a genetic algorithm (GA) designed specifically to discover interesting fuzzy prediction rules. The GA searches for prediction rules that are interesting in the sense of being new and surprising for the user. This is done adapting a technique little exploited in the literature, which is based on user-defined general impressions (subjective knowledge). More precisely, a prediction rule is considered interesting (or surprising) to the extent that it represents knowledge that not only was previously unknown by the user but also contradicts his original believes. In addition, the use of fuzzy logic helps to improve the comprehensibility of the rules discovered by the GA. This is due to the use of linguistic terms that are natural for the user. A prototype was implemented and applied to a real-world science & technology database, containing data about the scientific production of researchers. The GA implemented in this prototype was evaluated by comparing it with the J4.8 algorithm, a variant of the well-known C4.5 algorithm. Experiments were carried out to evaluate both the predictive accuracy and the degree of interestingness (or surprisingness) of the rules discovered by both algorithms. The predictive accuracy obtained by the proposed GA was similar to the one obtained by J4.8, but the former, in general, discovered rules with fewer conditions. In addition it works with natural linguistic terms, which leads to the discovery of more comprehensible knowledge. The rules discovered by the proposed GA and the best rules discovered by J4.8 were shown to a user (a University Director) in an interview who evaluated the degree of interestingness (surprisingness) of the rules to him. In general the user considered the rules discovered by the GA much more interesting than the rules discovered by J4.8

Topics: QA76
Publisher: Elsevier
Year: 2004
OAI identifier: oai:kar.kent.ac.uk:14173

Suggested articles

Citations

  1. (2002). A distributed-population genetic algorithm for discovering interesting prediction rules, doi
  2. (2002). A Genetic Algorithm for Discovering Interesting Fuzzy Prediction Rules: applications to science and technology data,
  3. (2001). A hybrid approach to support interactive data mining,
  4. (1996). Advances in knowledge discovery & data mining, Chapter 1: From data mining to knowledge discovery: an overview, doi
  5. (2000). Analyzing the subjective interestingness of association rules, doi
  6. (2001). Breeding decision trees using evolutionary techniques,
  7. (2000). ClaDia: a fuzzy classifier system for disease diagnosis, doi
  8. (2000). Constructing fuzzy models with linguistic integrity from numerical data – AFRELI algorithm, doi
  9. (1997). Construction and Assessment of Classification Rules doi
  10. (2002). Data Mining and Knowledge Discovery with Evolutionary Algorithms, doi
  11. (2000). Data Mining: doi
  12. (1999). Designing Compact Fuzzy Rule-Based Systems with Default Hierarchies for Linguistic Approximation, doi
  13. (2001). Discovering fuzzy classification rules with genetic programming and co-evolution, doi
  14. (2000). Discovering Interesting Patterns for Investment Decision Making with
  15. (1999). Discovering interesting prediction rules with a genetic algorithm, doi
  16. (1999). Evolutionary hot spots data mining: an architecture for exploring for interesting discoveries,
  17. (2000). Feature selection in unsupervised learning via evolutionary search, doi
  18. (1999). Fuzzy computing for data mining, doi
  19. (2000). GA-fuzzy modeling and classification: complexity and performance, doi
  20. (1999). Generating Linguistic Fuzzy Rules for Pattern Classification with Genetic Algorithms, doi
  21. (1987). Generating production rules from decision trees,
  22. (1989). Genetic algorithms in search, optimization, and machine learning doi
  23. (2000). Genetic programming for knowledge discovery in chest pain diagnosis, doi
  24. (1999). Independent and Simultaneous Evolution of Fuzzy Sleep Classifiers by Genetic Algorithms,
  25. (2001). Knowledge discovery and measures of interest doi
  26. (2001). Multiobjective Optimization Using Evolutionary Algorithms, doi
  27. (1998). On objective measures of rule surprisingness, doi
  28. (1996). Post-analysis of learned rules,
  29. (1993). Selecting among rules induced from a hurricane database, doi
  30. (2000). Soft decision trees: a new approach using non-linear fuzzification, doi
  31. (2000). Tournament selection, in: doi
  32. (2000). Understanding the crucial differences between classification and discovery of association rules – a position paper, doi
  33. (2001). Understanding the Crucial Role of Attribute Interaction in Data Mining,
  34. (1997). Using general impressions to analyze discovered classification rules,
  35. (1996). What Makes Patterns Interesting in Knowledge Discovery Systems, doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.