Search CORE

86,131 research outputs found

On the Suitability of Genetic-Based Algorithms for Data Mining

Author: A Freitas
R Elmasri
Publication venue: Springer Verlag
Publication date: 01/01/1998
Field of study

Data mining has as goal to extract knowledge from large databases. A database may be considered as a search space consisting of an enormous number of elements, and a mining algorithm as a search strategy. In general, an exhaustive search of the space is infeasible. Therefore, efficient search strategies are of vital importance. Search strategies on genetic-based algorithms have been applied successfully in a wide range of applications. We focus on the suitability of genetic-based algorithms for data mining. We discuss the design and implementation of a genetic-based algorithm for data mining and illustrate its potentials

CiteSeerX

Crossref

NLR Reports Repository

University of Twente Research Information

A Survey of Parallel Data Mining

Author: Freitas Alex A.
Publication venue
Publication date
Field of study

With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

Kent Academic Repository

Mining Interesting Positive and Negative Association Rule Based on Improved Genetic Algorithm (MIPNAR_GA)

Author: Jain Anurag
Jain Susheel
Rai Nikky
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/12/2013
Field of study

Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the mining of positive and negative rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. A number of algorithms are wonder performance but produce large number of negative association rule and also suffered from multi-scan problem. The idea of this paper is to eliminate these problems and reduce large number of negative rules. Hence we proposed an improved approach to mine interesting positive and negative rules based on genetic and MLMS algorithm. In this method we used a multi-level multiple support of data table as 0 and 1. The divided process reduces the scanning time of database. The proposed algorithm is a combination of MLMS and genetic algorithm. This paper proposed a new algorithm (MIPNAR_GA) for mining interesting positive and negative rule from frequent and infrequent pattern sets. The algorithm is accomplished in to three phases: a).Extract frequent and infrequent pattern sets by using apriori method b).Efficiently generate positive and negative rule. c).Prune redundant rule by applying interesting measures. The process of rule optimization is performed by genetic algorithm and for evaluation of algorithm conducted the real world dataset such as heart disease data and some standard data used from UCI machine learning repository.Keywords— Association rule mining, negative rule and positive rules, frequent and infrequent pattern set, genetic algorithm

International Institute for Science, Technology and Education (IISTE): E-Journals

Towards a framework for designing full model selection and optimization systems

Author: Mayo Michael
Pfahringer Bernhard
Sun Quan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem of how to choose a combination of data processing tools and algorithms for a given dataset. This problem is usually termed the Full Model Selection (FMS) problem. Extended from our previous work [10], in this paper, we introduce a framework for designing FMS algorithms. Under this framework, we propose a novel algorithm combining both genetic algorithms (GA) and particle swarm optimization (PSO) named GPS (which stands for GA-PSO-FMS), in which a GA is used for searching the optimal structure for a data mining solution, and PSO is used for searching optimal parameters for a particular structure instance. Given a classification dataset, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are available to the problem. Experimental results demonstrate the benefit of the algorithm. We also present, with detailed analysis, two model-tree-based variants for speeding up the GPS algorithm

Research Commons@Waikato

Modelling epistasis in genetic disease using Petri nets, evolutionary computation and frequent itemset mining

Author: Beretta Lorenzo
Mayo Michael
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Petri nets are useful for mathematically modelling disease-causing genetic epistasis. A Petri net model of an interaction has the potential to lead to biological insight into the cause of a genetic disease. However, defining a Petri net by hand for a particular interaction is extremely difficult because of the sheer complexity of the problem and degrees of freedom inherent in a Petri net’s architecture. We propose therefore a novel method, based on evolutionary computation and data mining, for automatically constructing Petri net models of non-linear gene interactions. The method comprises two main steps. Firstly, an initial partial Petri net is set up with several repeated sub-nets that model individual genes and a set of constraints, comprising relevant common sense and biological knowledge, is also defined. These constraints characterise the class of Petri nets that are desired. Secondly, this initial Petri net structure and the constraints are used as the input to a genetic algorithm. The genetic algorithm searches for a Petri net architecture that is both a superset of the initial net, and also conforms to all of the given constraints. The genetic algorithm evaluation function that we employ gives equal weighting to both the accuracy of the net and also its parsimony. We demonstrate our method using an epistatic model related to the presence of digital ulcers in systemic sclerosis patients that was recently reported in the literature. Our results show that although individual “perfect” Petri nets can frequently be discovered for this interaction, the true value of this approach lies in generating many different perfect nets, and applying data mining techniques to them in order to elucidate common and statistically significant patterns of interaction

Research Commons@Waikato

Recommended from our members

A Modified Stacking Ensemble Machine Learning Algorithm Using Genetic Algorithms

Author: Al-laymoun O\u27la Hmoud
Sikora Riyaz
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2014
Field of study

With the massive increase in the data being collected as a result of ubiquitous information gathering devices, and the increased need for doing data mining and analyses, there is a need for scaling up and improving the performance of traditional data mining and learning algorithms. Two related fields of distributed data mining and ensemble learning aim to address this scaling issue. Distributed data mining looks at how data that is distributed can be effectively mined without having to collect the data at one central location. Ensemble learning techniques aim to create a meta-classifier by combining several classifiers created on the same data and improve their performance. In this paper we use concepts from both of these fields to create a modified and improved version of the standard stacking ensemble learning technique by using a genetic algorithm (GA) for creating the meta-classifier. We use concepts from distributed data mining to study different ways of distributing the data and use the concept of stacking ensemble learning to use different learning algorithms on each sub-set and create a meta-classifier using a genetic algorithm. We test the GA-based stacking algorithm on ten data sets from the UCI Data Repository and show the improvement in performance over the individual learning algorithms as well as over the standard stacking algorithm

CSUSB ScholarWorks

Study of Genetic Algorithm, an Evolutionary Approach

Author: Mrs.K.Jayavani, Dr.G.M.Kadhar Nawaz
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/08/2014
Field of study

Data mining is the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies and significant structures, from large amount of data stored in databases, data warehouses, or other information repositories. To do this process, data mining uses a variety of algorithms according to the specifications of measures and threshold. The results of this analysis are then used to build models based on real world behavior, which are in turn used to analyze incoming data and make predictions about future behavior. Here, we are focusing on one of the efficient evolutionary algorithm called genetic algorithm. This is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. Genetic algorithms are numerical optimization algorithms inspired by both natural selection and natural genetics. This method is a general one, capable of being applied to an extremely wide range of problems. In this paper we will discuss the Genetic algorithm techniques and its application in data mining in detail

International Journal on Recent and Innovation Trends in Computing and Communication