38,834 research outputs found

    Relational methodology for data mining and knowledge discovery

    Get PDF
    Knowledge discovery and data mining methods have been successful in many domains. However, their abilities to build or discover a domain theory remain unclear. This is largely due to the fact that many fundamental KDD&DM methodological questions are still unexplored such as (1) the nature of the information contained in input data relative to the domain theory, and (2) the nature of the knowledge that these methods discover. The goal of this paper is to clarify methodological questions of KDD&DM methods. This is done by using the concept of Relational Data Mining (RDM), representative measurement theory, an ontology of a subject domain, a many-sorted empirical system (algebraic structure in the first-order logic), and an ontology of a KDD&DM method. The paper concludes with a review of our RDM approach and \u27Discovery\u27 system built on this methodology that can analyze any hypotheses represented in the first-order logic and use any input by representing it in many-sorted empirical system

    Symbolic methodology for numeric data mining

    Get PDF
    Currently statistical and artificial neural network methods dominate in data mining applications. Alternative relational (symbolic) data mining methods have shown their effectiveness in robotics, drug design, and other areas. Neural networks and decision tree methods have serious limitations in capturing relations that may have a variety of forms. Learning systems based on symbolic first-order logic (FOL) representations capture relations naturally. The learned regularities are understandable directly in domain terms that help to build a domain theory. This paper describes relational data mining methodology and develops it further for numeric data such as financial and spatial data. This includes (1) comparing the attribute-value representation with the relational representation, (2) defining a new concept of joint relational representations, (3) a process of their use, and the Discovery algorithm. This methodology handles uniformly the numerical and interval forecasting tasks as well as classification tasks. It is shown that Relational Data Mining (RDM) can handle multiple constrains, initial rules and background knowledge very naturally to reduce the search space in contrast with attribute-based data mining. Theoretical concepts are illustrated with examples from financial and image processing domains

    A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

    Get PDF
    This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization
    • …
    corecore