3 research outputs found

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm

    Get PDF
    AbstractWhile there has been a long history of rule-based text classifiers, to the best of our knowledge no M-of-N-based approach for text categorization has so far been proposed. In this paper we argue that M-of-N hypotheses are particularly suitable to model the text classification task because of the so-called “family resemblance” metaphor: “the members (i.e., documents) of a family (i.e., category) share some small number of features, yet there is no common feature among all of them. Nevertheless, they resemble each other”. Starting from this conjecture, we provide a sound extension of the M-of-N approach with negation and disjunction, called M-of-N{¬,∨}, which enables to best fit the true structure of the data. Based on a thorough theoretical study, we show that the M-of-N{¬,∨} hypothesis space has two partial orders that form complete lattices.GAMoN is the task-specific Genetic Algorithm (GA) which, by exploiting the lattice-based structure of the hypothesis space, efficiently induces accurate M-of-N{¬,∨} hypotheses.Benchmarking was performed over 13 real-world text data sets, by using four rule induction algorithms: two GAs, namely, BioHEL and OlexGA, and two non-evolutionary algorithms, namely, C4.5 and Ripper. Further, we included in our study linear SVM, as it is reported to be among the best methods for text categorization. Experimental results demonstrate that GAMoN delivers state-of-the-art classification performance, providing a good balance between accuracy and model complexity. Further, they show that GAMoN can scale up to large and realistic real-world domains better than both C4.5 and Ripper

    A Network Genetic Algorithm for Concept Learning

    No full text
    This paper presents a highly parallel genetic algorithm, designed for concept induction in propositional and first order logics. The system exploits niches and species for learning multimodal concepts; it deeply differs from other systems because of the distributed architecture, which totally eliminates the concept of common memory. A first implementation of the system, designed for checking the possibility of exploiting parallel processing in network computer, is evaluated on standard benchmarks. The experimental results show that the system reaches good performances both with respect to the quality of the learned knowledge and with respect to the speed up on a workstation cluster. 1 INTRODUCTION In the recent literature, Genetic Algorithms (GAs) emerged as valuable search tools in the field of concept induction (De Jong et al., 1993; Janikow, 1993; Greene & Smith, 1993; Giordana & Neri, 1996). Specifically, two of their features look particularly attractive, namely the exploration p..
    corecore