Discretization of Continuous Attributes

Muhlenbach, Fabrice; Rakotomalala, Ricco

research

Discretization of Continuous Attributes

Authors: Fabrice Muhlenbach
Ricco Rakotomalala
Publication date: 1 April 2005
Publisher: Idea Group Reference

Abstract

7 pagesIn the data mining field, many learning methods -like association rules, Bayesian networks, induction rules (Grzymala-Busse & Stefanowski, 2001)- can handle only discrete attributes. Therefore, before the machine learning process, it is necessary to re-encode each continuous attribute in a discrete attribute constituted by a set of intervals, for example the age attribute can be transformed in two discrete values representing two intervals: less than 18 (a minor) and 18 and more (of age). This process, known as discretization, is an essential task of the data preprocessing, not only because some learning methods do not handle continuous attributes, but also for other important reasons: the data transformed in a set of intervals are more cognitively relevant for a human interpretation (Liu, Hussain, Tan & Dash, 2002); the computation process goes faster with a reduced level of data, particularly when some attributes are suppressed from the representation space of the learning problem if it is impossible to find a relevant cut (Mittal & Cheong, 2002); the discretization can provide non-linear relations -e.g., the infants and the elderly people are more sensitive to illness

Similar works

Full text

Available Versions

HAL-UJM

oai:HAL:hal-00383757v2

Last time updated on 12/11/2016

HAL

oai:HAL:hal-00383757v2

Last time updated on 01/11/2023