Search CORE

1,099 research outputs found

Recommended from our members

A niching memetic algorithm for simultaneous clustering and feature selection

Author: Fairhurst M
Liu X
Sheng W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2008
Field of study

Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data

Brunel University Research Archive

A systematic review of data quality issues in knowledge discovery tasks

Author: Corrales David Camilo
Corrales Juan Carlos
Ledezma Agapito Ismael
Publication venue: 'Universidad de Medellin'
Publication date: 07/11/2015
Field of study

Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad de Medellín: Revistas Científicas

Repositorio Institucional Universidad de Medellín

DIALNET

HYEI: A New Hybrid Evolutionary Imperialist Competitive Algorithm for Fuzzy Knowledge Discovery

Author: D. Jalal Nouri
F. Ghareh Mohammadi
M. Saniee Abadeh
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

In recent years, imperialist competitive algorithm (ICA), genetic algorithm (GA), and hybrid fuzzy classification systems have been successfully and effectively employed for classification tasks of data mining. Due to overcoming the gaps related to ineffectiveness of current algorithms for analysing high-dimension independent datasets, a new hybrid approach, named HYEI, is presented to discover generic rule-based systems in this paper. This proposed approach consists of three stages and combines an evolutionary-based fuzzy system with two ICA procedures to generate high-quality fuzzy-classification rules. Initially, the best feature subset is selected by using the embedded ICA feature selection, and then these features are used to generate basic fuzzy-classification rules. Finally, all rules are optimized by using an ICA algorithm to reduce their length or to eliminate some of them. The performance of HYEI has been evaluated by using several benchmark datasets from the UCI machine learning repository. The classification accuracy attained by the proposed algorithm has the highest classification accuracy in 6 out of the 7 dataset problems and is comparative to the classification accuracy of the 5 other test problems, as compared to the best results previously published

Crossref

Directory of Open Access Journals

An Intelligent Genetic Algorithm for Mining Classification Rules in Large Datasets

Author: Nedunchezhian R.
Rajalakshmi M.
Vivekanandan P.
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 22/03/2013
Field of study

Genetic algorithm is a popular classification algorithm which creates a random population of candidate solutions and makes them to evolve into a suitable accurate solution for a given problem by processing them iteratively for several generations. During each generation the training data set is accessed by the genetic algorithm only for the population member's fitness calculation and no other extra knowledge about the problem domain is extracted from the training data set. Even the domain knowledge stored in the chromosome code of the population may be lost in the future generations due to genetic operations. All the genetic operations like crossover and mutation are probability based and they do not depend upon the domain knowledge. This phenomenon makes the genetic algorithm to converge slowly. This paper proposes a genetic algorithm which tries to gain maximum knowledge in between the generations and store them in the form of knowledge chromosomes. The gained knowledge is used to make predictions about the search space and to guide the search process to an area with potential solutions in the subsequent generations. This makes the genetic algorithm to converge quickly which in turn reduces the learning cost. The experiments show that the run time is reduced considerably when compared with the state-of-the-art evolutionary algorithm

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Target detection with morphological shared-weight neural network : different update approaches

Author: Ye Yixuan
Publication venue: University of Missouri--Columbia
Publication date
Field of study

Neural networks are widely used for image processing. Of these, the convolutional neural network (CNN) is one of the most popular. However, the CNN needs a large amount of training data to improve its accuracy. If training data is limited, a morphological shared-weight neural network (MSNN) can be a better choice. In this thesis, two different update approaches based on an evolutionary algorithm are proposed and compared to each other for target detection based on the MSNN. Another network training, based on back propagation, is used for comparisons in this thesis, which was proposed by Yongwan Won and applied by my colleague and fellow graduate student, Shuxian Shen and Anes Ouadou. Single-layer and multiple-layer MSNNs are both presented with different approaches. For a dataset, the author created part of a dataset for this thesis and used another dataset created by Shen to make comparisons with her network. Results of the MSNN are compared with CNN results to show the performance. Experiments show that for a single-layer MSNN, the performance of an evolutionary algorithm with partial backpropagation is the best. For a multiple layer MSNN, backpropagation performs better, although the MSNN still has a better performance than the CNN.Includes bibliographical reference

University of Missouri: MOspace