Search CORE

15,774 research outputs found

Global Entropy Based Greedy Algorithm for discretization

Author: Jonnalagadda Sai Jyothsna
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/05/2016
Field of study

Discretization algorithm is a crucial step to not only achieve summarization of continuous attributes but also better performance in classification that requires discrete values as input. In this thesis, I propose a supervised discretization method, Global Entropy Based Greedy algorithm, which is based on the Information Entropy Minimization. Experimental results show that the proposed method outperforms state of the art methods with well-known benchmarking datasets. To further improve the proposed method, a new approach for stop criterion that is based on the change rate of entropy was also explored. From the experimental analysis, it is noticed that the threshold based on the decreasing rate of entropy could be more effective than a constant number of intervals in the classification such as C5.0

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Using a unified measure function for heuristics, discretization, and rule quality evaluation in Ant-Miner

Author: Otero Fernando E.B.
Salama Khalid M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2013
Field of study

Ant-Miner is a classification rule discovery algorithm that is based on Ant Colony Optimization (ACO) meta-heuristic. cAnt-Miner is the extended version of the algorithm that handles continuous attributes on-the-fly during the rule construction process, while ?Ant-Miner is an extension of the algorithm that selects the rule class prior to its construction, and utilizes multiple pheromone types, one for each permitted rule class. In this paper, we combine these two algorithms to derive a new approach for learning classification rules using ACO. The proposed approach is based on using the measure function for 1) computing the heuristics for rule term selection, 2) a criteria for discretizing continuous attributes, and 3) evaluating the quality of the constructed rule for pheromone update as well. We explore the effect of using different measure functions for on the output model in terms of predictive accuracy and model size. Empirical evaluations found that hypothesis of different functions produce different results are acceptable according to Friedman’s statistical test

Crossref

Kent Academic Repository

Using entropy-based local weighting to improve similarity assessment

Author: Comas Joaquim
Cortés García Claudio Ulises
Núñez Héctor
Poch Manel
Rodriguez-Roda Ignasi
Sànchez-Marrè Miquel
Publication venue
Publication date: 01/01/2002
Field of study

This paper enhances and analyses the power of local weighted similarity measures. The paper proposes a new entropy-based local weighting algorithm to be used in similarity assessment to improve the performance of the CBR retrieval task. It has been carried out a comparative analysis of the performance of unweighted similarity measures, global weighted similarity measures, and local weighting similarity measures. The testing has been done using several similarity measures, and some data sets from the UCI Machine Learning Database Repository and other environmental databases.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Improved Heterogeneous Distance Functions

Author: Martinez T. R.
Wilson D. R.
Publication venue
Publication date: 31/12/1996
Field of study

Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

Improving the Evolutionary Coding for Machine Learning Tasks

Author: Aguilar Ruiz Jesús Salvador
Riquelme Santos José Cristóbal
Valle Sevillano Carmelo del
Publication venue: 'IOS Press'
Publication date: 01/01/2002
Field of study

The most influential factors in the quality of the solutions found by an evolutionary algorithm are a correct coding of the search space and an appropriate evaluation function of the potential solutions. The coding of the search space for the obtaining of decision rules is approached, i.e., the representation of the individuals of the genetic population. Two new methods for encoding discrete and continuous attributes are presented. Our “natural coding” uses one gene per attribute (continuous or discrete) leading to a reduction in the search space. Genetic operators for this approached natural coding are formally described and the reduction of the size of the search space is analysed for several databases from the UCI machine learning repository.Comisión Interministerial de Ciencia y Tecnología TIC1143–C03–0

idUS. Depósito de Investigación Universidad de Sevilla