165,424 research outputs found
Recommended from our members
Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach
A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction
This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization
Recommended from our members
Scaling up classification rule induction through parallel processing
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction
A Survey of Parallel Data Mining
With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons
learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms
A Calculus of Looping Sequences with Local Rules
In this paper we present a variant of the Calculus of Looping Sequences (CLS
for short) with global and local rewrite rules. While global rules, as in CLS,
are applied anywhere in a given term, local rules can only be applied in the
compartment on which they are defined. Local rules are dynamic: they can be
added, moved and erased. We enrich the new calculus with a parallel semantics
where a reduction step is lead by any number of global and local rules that
could be performed in parallel. A type system is developed to enforce the
property that a compartment must contain only local rules with specific
features. As a running example we model some interactions happening in a cell
starting from its nucleus and moving towards its mitochondria.Comment: In Proceedings DCM 2011, arXiv:1207.682
Recommended from our members
Random Prism: An Alternative to Random Forests.
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting
On the confluence of lambda-calculus with conditional rewriting
The confluence of untyped \lambda-calculus with unconditional rewriting is
now well un- derstood. In this paper, we investigate the confluence of
\lambda-calculus with conditional rewriting and provide general results in two
directions. First, when conditional rules are algebraic. This extends results
of M\"uller and Dougherty for unconditional rewriting. Two cases are
considered, whether \beta-reduction is allowed or not in the evaluation of
conditions. Moreover, Dougherty's result is improved from the assumption of
strongly normalizing \beta-reduction to weakly normalizing \beta-reduction. We
also provide examples showing that outside these conditions, modularity of
confluence is difficult to achieve. Second, we go beyond the algebraic
framework and get new confluence results using a restricted notion of
orthogonality that takes advantage of the conditional part of rewrite rules
- …