    Rule-based Machine Learning Methods for Functional Prediction

    We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

    Attribute Selection for Classification

    The selection of attributes used to construct a classification model is crucial in machine learning, in particular with instance similarity methods. We present a new algorithm to select and rank attributes based on weighing features according to their ability to help class prediction. The algorithm uses the same structure that holds training records for classification. Attribute values and their classes are projected into a one-dimensional space, to account for various degrees of the relationship between them. With the user deciding on the degree of this relation, any of several potential solutions can be used as criterion to determine attribute relevance. This low complexity algorithm increases classification predictive accuracy and also helps to reduce the feature dimension problem

    Improved Heterogeneous Distance Functions

    Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    Models of incremental concept formation

    Given a set of observations, humans acquire concepts that organize those observations and use them in classifying future experiences. This type of concept formation can occur in the absence of a tutor and it can take place despite irrelevant and incomplete information. A reasonable model of such human concept learning should be both incremental and capable of handling this type of complex experiences that people encounter in the real world. In this paper, we review three previous models of incremental concept formation and then present CLASSIT, a model that extends these earlier systems. All of the models integrate the process of recognition and learning, and all can be viewed as carrying out search through the space of possible concept hierarchies. In an attempt to show that CLASSIT is a robust concept formation system, we also present some empirical studies of its behavior under a variety of conditions

    Clasificaci贸n autom谩tica basada en an谩lisis espectral

    Esta tesis aborda la definici贸n de un m茅todo num茅rico basado en invariantes para la clasificaci贸n autom谩tica de objetos a partir de la informaci贸n de sus caracteres, focalizado en la b煤squeda de las invariantes con base en una aplicaci贸n original metodol贸gica de los principios de superposici贸n e interferencia en el an谩lisis de espectros, en congruencia anal贸gica con la taxonom铆a num茅rica, por su relaci贸n l贸gica y con fortaleza metodol贸gica.Facultad de Inform谩tic

    Marc integrador de les capacitats de Soft-Computing i de Knowledge Discovery dels Mapes Autoorganitzatius en el Raonament Basat en Casos

    El Raonament Basat en Casos (CBR) 茅s un paradigma d'aprenentatge basat en establir analogies amb problemes pr猫viament resolts per resoldre'n de nous. Per tant, l'organitzaci贸, l'acc茅s i la utilitzaci贸 del coneixement previ s贸n aspectes claus per tenir 猫xit en aquest proc茅s. No obstant, la majoria dels problemes reals presenten grans volums de dades complexes, incertes i amb coneixement aproximat i, conseq眉entment, el rendiment del CBR pot veure's minvat degut a la complexitat de gestionar aquest tipus de coneixement. Aix貌 ha fet que en els 煤ltims anys hagi sorgit una nova l铆nia de recerca anomenada Soft-Computing and Intelligent Information Retrieval enfocada en mitigar aquests efectes. D'aqu铆 neix el context d'aquesta tesi.Dins de l'ampli ventall de t猫cniques Soft-Computing per tractar coneixement complex, els Mapes Autoorganitzatius (SOM) destaquen sobre la resta per la seva capacitat en agrupar les dades en patrons, els quals permeten detectar relacions ocultes entre les dades. Por un lado, se aborda la definici贸n de funciones de similitud espec铆ficas para definir como comparar un caso resuelto con otro nuevo mediante una variante de la Computaci贸n Evolutiva denominada Evoluci贸n de Gram谩ticas (GE). Por otro lado, se estudia como definir esquemas de cooperaci贸n entre sistemas heterog茅neos para mejorar la fiabilidad de su respuesta conjunta mediante GE. Ambas l铆neas son integradas en dos plataformas, BRAIN y MGE, las cuales tambi茅n son evaluadas sobre los datasets anteriores.Case-Based Reasoning (CBR) is an approach of machine learning based on solving new problems by identifying analogies with other previous solved problems. Thus, organization, access and management of this knowledge are crucial issues for achieving successful results. Nevertheless, the major part of real problems presents a huge amount of complex data, which also presents uncertain and partial knowledge. Therefore, CBR performance is influenced by the complex management of this knowledge. 