5,593 research outputs found

    An algorithm for discretization of real value attributes based on interval similarity

    Get PDF
    Extent: 8p.Discretization algorithm for real value attributes is of very important uses in many areas such as intelligence and machine learning. The algorithms related to Chi2 algorithm (includes modified Chi2 algorithm and extended Chi2 algorithm) are famous discretization algorithm exploiting the technique of probability and statistics. In this paper the algorithms are analyzed, and their drawback is pointed. Based on the analysis a new modified algorithm based on interval similarity is proposed. The new algorithm defines an interval similarity function which is regarded as a new merging standard in the process of discretization. At the same time, two important parameters (condition parameterαand tiny move parameterc) in the process of discretization and discrepancy extent of a number of adjacent two intervals are given in the form of function. The related theory analysis and the experiment results show that the presented algorithm is effective.Li Zou, Deqin Yan, Hamid Reza Karimi, and Peng Sh

    Using entropy-based local weighting to improve similarity assessment

    Get PDF
    This paper enhances and analyses the power of local weighted similarity measures. The paper proposes a new entropy-based local weighting algorithm to be used in similarity assessment to improve the performance of the CBR retrieval task. It has been carried out a comparative analysis of the performance of unweighted similarity measures, global weighted similarity measures, and local weighting similarity measures. The testing has been done using several similarity measures, and some data sets from the UCI Machine Learning Database Repository and other environmental databases.Postprint (published version

    Improving the Evolutionary Coding for Machine Learning Tasks

    Get PDF
    The most influential factors in the quality of the solutions found by an evolutionary algorithm are a correct coding of the search space and an appropriate evaluation function of the potential solutions. The coding of the search space for the obtaining of decision rules is approached, i.e., the representation of the individuals of the genetic population. Two new methods for encoding discrete and continuous attributes are presented. Our “natural coding” uses one gene per attribute (continuous or discrete) leading to a reduction in the search space. Genetic operators for this approached natural coding are formally described and the reduction of the size of the search space is analysed for several databases from the UCI machine learning repository.Comisión Interministerial de Ciencia y Tecnología TIC1143–C03–0

    Effective retrieval and new indexing method for case based reasoning: Application in chemical process design

    Get PDF
    In this paper we try to improve the retrieval step for case based reasoning for preliminary design. This improvement deals with three major parts of our CBR system. First, in the preliminary design step, some uncertainties like imprecise or unknown values remain in the description of the problem, because they need a deeper analysis to be withdrawn. To deal with this issue, the faced problem description is soften with the fuzzy sets theory. Features are described with a central value, a percentage of imprecision and a relation with respect to the central value. These additional data allow us to build a domain of possible values for each attributes. With this representation, the calculation of the similarity function is impacted, thus the characteristic function is used to calculate the local similarity between two features. Second, we focus our attention on the main goal of the retrieve step in CBR to find relevant cases for adaptation. In this second part, we discuss the assumption of similarity to find the more appropriated case. We put in highlight that in some situations this classical similarity must be improved with further knowledge to facilitate case adaptation. To avoid failure during the adaptation step, we implement a method that couples similarity measurement with adaptability one, in order to approximate the cases utility more accurately. The latter gives deeper information for the reusing of cases. In a last part, we present a generic indexing technique for the base, and a new algorithm for the research of relevant cases in the memory. The sphere indexing algorithm is a domain independent index that has performances equivalent to the decision tree ones. But its main strength is that it puts the current problem in the center of the research area avoiding boundaries issues. All these points are discussed and exemplified through the preliminary design of a chemical engineering unit operation

    Using rule extraction to improve the comprehensibility of predictive models.

    Get PDF
    Whereas newer machine learning techniques, like artifficial neural net-works and support vector machines, have shown superior performance in various benchmarking studies, the application of these techniques remains largely restricted to research environments. A more widespread adoption of these techniques is foiled by their lack of explanation capability which is required in some application areas, like medical diagnosis or credit scoring. To overcome this restriction, various algorithms have been proposed to extract a meaningful description of the underlying `blackbox' models. These algorithms' dual goal is to mimic the behavior of the black box as closely as possible while at the same time they have to ensure that the extracted description is maximally comprehensible. In this research report, we first develop a formal definition of`rule extraction and comment on the inherent trade-off between accuracy and comprehensibility. Afterwards, we develop a taxonomy by which rule extraction algorithms can be classiffied and discuss some criteria by which these algorithms can be evaluated. Finally, an in-depth review of the most important algorithms is given.This report is concluded by pointing out some general shortcomings of existing techniques and opportunities for future research.Models; Model; Algorithms; Criteria; Opportunities; Research; Learning; Neural networks; Networks; Performance; Benchmarking; Studies; Area; Credit; Credit scoring; Behavior; Time;

    Improved Heterogeneous Distance Functions

    Full text link
    Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl
    corecore