7 research outputs found

    Rule-based Machine Learning Methods for Functional Prediction

    Full text link
    We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

    Attribute Selection for Classification

    Get PDF
    The selection of attributes used to construct a classification model is crucial in machine learning, in particular with instance similarity methods. We present a new algorithm to select and rank attributes based on weighing features according to their ability to help class prediction. The algorithm uses the same structure that holds training records for classification. Attribute values and their classes are projected into a one-dimensional space, to account for various degrees of the relationship between them. With the user deciding on the degree of this relation, any of several potential solutions can be used as criterion to determine attribute relevance. This low complexity algorithm increases classification predictive accuracy and also helps to reduce the feature dimension problem

    Improved Heterogeneous Distance Functions

    Full text link
    Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    Models of incremental concept formation

    Get PDF
    Given a set of observations, humans acquire concepts that organize those observations and use them in classifying future experiences. This type of concept formation can occur in the absence of a tutor and it can take place despite irrelevant and incomplete information. A reasonable model of such human concept learning should be both incremental and capable of handling this type of complex experiences that people encounter in the real world. In this paper, we review three previous models of incremental concept formation and then present CLASSIT, a model that extends these earlier systems. All of the models integrate the process of recognition and learning, and all can be viewed as carrying out search through the space of possible concept hierarchies. In an attempt to show that CLASSIT is a robust concept formation system, we also present some empirical studies of its behavior under a variety of conditions

    Clasificaci贸n autom谩tica basada en an谩lisis espectral

    Get PDF
    Esta tesis aborda la definici贸n de un m茅todo num茅rico basado en invariantes para la clasificaci贸n autom谩tica de objetos a partir de la informaci贸n de sus caracteres, focalizado en la b煤squeda de las invariantes con base en una aplicaci贸n original metodol贸gica de los principios de superposici贸n e interferencia en el an谩lisis de espectros, en congruencia anal贸gica con la taxonom铆a num茅rica, por su relaci贸n l贸gica y con fortaleza metodol贸gica.Facultad de Inform谩tic

    Marc integrador de les capacitats de Soft-Computing i de Knowledge Discovery dels Mapes Autoorganitzatius en el Raonament Basat en Casos

    Get PDF
    El Raonament Basat en Casos (CBR) 茅s un paradigma d'aprenentatge basat en establir analogies amb problemes pr猫viament resolts per resoldre'n de nous. Per tant, l'organitzaci贸, l'acc茅s i la utilitzaci贸 del coneixement previ s贸n aspectes claus per tenir 猫xit en aquest proc茅s. No obstant, la majoria dels problemes reals presenten grans volums de dades complexes, incertes i amb coneixement aproximat i, conseq眉entment, el rendiment del CBR pot veure's minvat degut a la complexitat de gestionar aquest tipus de coneixement. Aix貌 ha fet que en els 煤ltims anys hagi sorgit una nova l铆nia de recerca anomenada Soft-Computing and Intelligent Information Retrieval enfocada en mitigar aquests efectes. D'aqu铆 neix el context d'aquesta tesi.Dins de l'ampli ventall de t猫cniques Soft-Computing per tractar coneixement complex, els Mapes Autoorganitzatius (SOM) destaquen sobre la resta per la seva capacitat en agrupar les dades en patrons, els quals permeten detectar relacions ocultes entre les dades. Aquesta capacitat ha estat explotada en treballs previs d'altres investigadors, on s'ha organitzat la mem貌ria de casos del CBR amb SOM per tal de millorar la recuperaci贸 dels casos.La finalitat de la present tesi 茅s donar un pas m茅s enll脿 en la simple combinaci贸 del CBR i de SOM, de tal manera que aqu铆 s'introdueixen les capacitats de Soft-Computing i de Knowledge Discovery de SOM en totes les fases del CBR per nodrir-les del nou coneixement descobert. A m茅s a m茅s, les m猫triques de complexitat apareixen en aquest context com un instrument prec铆s per modelar el funcionament de SOM segons la tipologia de les dades. L'assoliment d'aquesta integraci贸 es pot dividir principalment en quatre fites: (1) la definici贸 d'una metodologia per determinar la millor manera de recuperar els casos tenint en compte la complexitat de les dades i els requeriments de l'usuari; (2) la millora de la fiabilitat de la proposta de solucions gr脿cies a les relacions entre els cl煤sters i els casos; (3) la potenciaci贸 de les capacitats explicatives mitjan莽ant la generaci贸 d'explicacions simb貌liques; (4) el manteniment incremental i semi-supervisat de la mem貌ria de casos organitzada per SOM.Tots aquests punts s'integren sota la plataforma SOMCBR, la qual 茅s extensament avaluada sobre datasets provinents de l'UCI Repository i de dominis m猫dics i telem脿tics.Addicionalment, la tesi aborda de manera secund脿ria dues l铆nies de recerca fru茂t dels requeriments dels projectes on ha estat ubicada. D'una banda, s'aborda la definici贸 de funcions de similitud espec铆fiques per definir com comparar un cas resolt amb un de nou mitjan莽ant una variant de la Computaci贸 Evolutiva anomenada Evoluci贸 de Gram脿tiques (GE). D'altra banda, s'estudia com definir esquemes de cooperaci贸 entre sistemes heterogenis per millorar la fiabilitat de la seva resposta conjunta mitjan莽ant GE. Ambdues l铆nies s贸n integrades en dues plataformes, BRAIN i MGE respectivament, i s贸n tamb茅 avaluades amb els datasets anteriors.El Razonamiento Basado en Casos (CBR) es un paradigma de aprendizaje basado en establecer analog铆as con problemas previamente resueltos para resolver otros nuevos. Por tanto, la organizaci贸n, el acceso y la utilizaci贸n del conocimiento previo son aspectos clave para tener 茅xito. No obstante, la mayor铆a de los problemas presentan grandes vol煤menes de datos complejos, inciertos y con conocimiento aproximado y, por tanto, el rendimiento del CBR puede verse afectado debido a la complejidad de gestionarlos. Esto ha hecho que en los 煤ltimos a帽os haya surgido una nueva l铆nea de investigaci贸n llamada Soft-Computing and Intelligent Information Retrieval focalizada en mitigar estos efectos. Es aqu铆 donde nace el contexto de esta tesis.Dentro del amplio abanico de t茅cnicas Soft-Computing para tratar conocimiento complejo, los Mapas Autoorganizativos (SOM) destacan por encima del resto por su capacidad de agrupar los datos en patrones, los cuales permiten detectar relaciones ocultas entre los datos. Esta capacidad ha sido aprovechada en trabajos previos de otros investigadores, donde se ha organizado la memoria de casos del CBR con SOM para mejorar la recuperaci贸n de los casos.La finalidad de la presente tesis es dar un paso m谩s en la simple combinaci贸n del CBR y de SOM, de tal manera que aqu铆 se introducen las capacidades de Soft-Computing y de Knowledge Discovery de SOM en todas las fases del CBR para alimentarlas del conocimiento nuevo descubierto. Adem谩s, las m茅tricas de complejidad aparecen en este contexto como un instrumento preciso para modelar el funcionamiento de SOM en funci贸n de la tipolog铆a de los datos. La consecuci贸n de esta integraci贸n se puede dividir principalmente en cuatro hitos: (1) la definici贸n de una metodolog铆a para determinar la mejor manera de recuperar los casos teniendo en cuenta la complejidad de los datos y los requerimientos del usuario; (2) la mejora de la fiabilidad en la propuesta de soluciones gracias a las relaciones entre los clusters y los casos; (3) la potenciaci贸n de las capacidades explicativas mediante la generaci贸n de explicaciones simb贸licas; (4) el mantenimiento incremental y semi-supervisado de la memoria de casos organizada por SOM. Todos estos puntos se integran en la plataforma SOMCBR, la cual es ampliamente evaluada sobre datasets procedentes del UCI Repository y de dominios m茅dicos y telem谩ticos.Adicionalmente, la tesis aborda secundariamente dos l铆neas de investigaci贸n fruto de los requeri-mientos de los proyectos donde ha estado ubicada la tesis. Por un lado, se aborda la definici贸n de funciones de similitud espec铆ficas para definir como comparar un caso resuelto con otro nuevo mediante una variante de la Computaci贸n Evolutiva denominada Evoluci贸n de Gram谩ticas (GE). Por otro lado, se estudia como definir esquemas de cooperaci贸n entre sistemas heterog茅neos para mejorar la fiabilidad de su respuesta conjunta mediante GE. Ambas l铆neas son integradas en dos plataformas, BRAIN y MGE, las cuales tambi茅n son evaluadas sobre los datasets anteriores.Case-Based Reasoning (CBR) is an approach of machine learning based on solving new problems by identifying analogies with other previous solved problems. Thus, organization, access and management of this knowledge are crucial issues for achieving successful results. Nevertheless, the major part of real problems presents a huge amount of complex data, which also presents uncertain and partial knowledge. Therefore, CBR performance is influenced by the complex management of this knowledge. For this reason, a new research topic has appeared in the last years for tackling this problem: Soft-Computing and Intelligent Information Retrieval. This is the point where this thesis was born.Inside the wide variety of Soft-Computing techniques for managing complex data, the Self-Organizing Maps (SOM) highlight from the rest due to their capability for grouping data according to certain patterns using the relations hidden in data. This capability has been used in a wide range of works, where the CBR case memory has been organized with SOM for improving the case retrieval.The goal of this thesis is to take a step up in the simple combination of CBR and SOM. This thesis presents how to introduce the Soft-Computing and Knowledge Discovery capabilities of SOM inside all the steps of CBR to promote them with the discovered knowledge. Furthermore, complexity measures appear in this context as a mechanism to model the performance of SOM according to data topology. The achievement of this goal can be split in the next four points: (1) the definition of a methodology for setting up the best way of retrieving cases taking into account the data complexity and user requirements; (2) the improvement of the classification reliability through the relations between cases and clusters; (3) the promotion of the explaining capabilities by means of the generation of symbolic explanations; (4) the incremental and semi-supervised case-based maintenance. All these points are integrated in the SOMCBR framework, which has been widely tested in datasets from UCI Repository and from medical and telematic domains. Additionally, this thesis secondly tackles two additional research lines due to the requirements of a project in which it has been developed. First, the definition of similarity functions ad hoc a domain is analyzed using a variant of the Evolutionary Computation called Grammar Evolution (GE). Second, the definition of cooperation schemes between heterogeneous systems is also analyzed for improving the reliability from the point of view of GE. Both lines are developed in two frameworks, BRAIN and MGE respectively, which are also evaluated over the last explained datasets
    corecore