16,332 research outputs found

    Learning Expressive Linkage Rules for Entity Matching using Genetic Programming

    Get PDF
    A central problem in data integration and data cleansing is to identify pairs of entities in data sets that describe the same real-world object. Many existing methods for matching entities rely on explicit linkage rules, which specify how two entities are compared for equivalence. Unfortunately, writing accurate linkage rules by hand is a non-trivial problem that requires detailed knowledge of the involved data sets. Another important issue is the efficient execution of linkage rules. In this thesis, we propose a set of novel methods that cover the complete entity matching workflow from the generation of linkage rules using genetic programming algorithms to their efficient execution on distributed systems. First, we propose a supervised learning algorithm that is capable of generating linkage rules from a gold standard consisting of set of entity pairs that have been labeled as duplicates or non-duplicates. We show that the introduced algorithm outperforms previously proposed entity matching approaches including the state-of-the-art genetic programming approach by de Carvalho et al. and is capable of learning linkage rules that achieve a similar accuracy than the human written rule for the same problem. In order to also cover use cases for which no gold standard is available, we propose a complementary active learning algorithm that generates a gold standard interactively by asking the user to confirm or decline the equivalence of a small number of entity pairs. In the experimental evaluation, labeling at most 50 link candidates was necessary in order to match the performance that is achieved by the supervised GenLink algorithm on the entire gold standard. Finally, we propose an efficient execution workflow that can be run on cluster of multiple machines. The execution workflow employs a novel multidimensional indexing method that allows the efficient execution of learned linkage rules by reducing the number of required comparisons significantly

    An incremental approach to genetic algorithms based classification

    Get PDF
    Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multi-agent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an “integration” operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed

    Incremental multiple objective genetic algorithms

    Get PDF
    This paper presents a new genetic algorithm approach to multi-objective optimization problemsIncremental Multiple Objective Genetic Algorithms (IMOGA). Different from conventional MOGA methods, it takes each objective into consideration incrementally. The whole evolution is divided into as many phases as the number of objectives, and one more objective is considered in each phase. Each phase is composed of two stages: first, an independent population is evolved to optimize one specific objective; second, the better-performing individuals from the evolved single-objective population and the multi-objective population evolved in the last phase are joined together by the operation of integration. The resulting population then becomes an initial multi-objective population, to which a multi-objective evolution based on the incremented objective set is applied. The experiment results show that, in most problems, the performance of IMOGA is better than that of three other MOGAs, NSGA-II, SPEA and PAES. IMOGA can find more solutions during the same time span, and the quality of solutions is better

    Cooperative co-evolution of GA-based classifiers based on input increments

    Get PDF
    Genetic algorithms (GAs) have been widely used as soft computing techniques in various applications, while cooperative co-evolution algorithms were proposed in the literature to improve the performance of basic GAs. In this paper, a new cooperative co-evolution algorithm, namely ECCGA, is proposed in the application domain of pattern classification. Concurrent local and global evolution and conclusive global evolution are proposed to improve further the classification performance. Different approaches of ECCGA are evaluated on benchmark classification data sets, and the results show that ECCGA can achieve better performance than the cooperative co-evolution genetic algorithm and normal GA. Some analysis and discussions on ECCGA and possible improvement are also presented

    Rule Extraction by Genetic Programming with Clustered Terminal Symbols

    Get PDF
    When Genetic Programming (GP) is applied to rule extraction from databases, the attributes of the data are often used for the terminal symbols. However, in the case of the database with a large number of attributes, the search space becomes vast because the size of the terminal set increases. As a result, the search performance declines. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by reducing the number of terminal symbols, the rough and rapid search is performed. In the latter stage of search, by using the original attributes for terminal symbols, the local search is performed. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate classification rules

    Gray Image extraction using Fuzzy Logic

    Full text link
    Fuzzy systems concern fundamental methodology to represent and process uncertainty and imprecision in the linguistic information. The fuzzy systems that use fuzzy rules to represent the domain knowledge of the problem are known as Fuzzy Rule Base Systems (FRBS). On the other hand image segmentation and subsequent extraction from a noise-affected background, with the help of various soft computing methods, are relatively new and quite popular due to various reasons. These methods include various Artificial Neural Network (ANN) models (primarily supervised in nature), Genetic Algorithm (GA) based techniques, intensity histogram based methods etc. providing an extraction solution working in unsupervised mode happens to be even more interesting problem. Literature suggests that effort in this respect appears to be quite rudimentary. In the present article, we propose a fuzzy rule guided novel technique that is functional devoid of any external intervention during execution. Experimental results suggest that this approach is an efficient one in comparison to different other techniques extensively addressed in literature. In order to justify the supremacy of performance of our proposed technique in respect of its competitors, we take recourse to effective metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), Peak Signal to Noise Ratio (PSNR).Comment: 8 pages, 5 figures, Fuzzy Rule Base, Image Extraction, Fuzzy Inference System (FIS), Membership Functions, Membership values,Image coding and Processing, Soft Computing, Computer Vision Accepted and published in IEEE. arXiv admin note: text overlap with arXiv:1206.363

    Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system

    Full text link
    A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using discrete and fuzzy dynamical system representations within the XCSF learning classifier system. In particular, asynchronous random Boolean networks are used to represent the traditional condition-action production system rules in the discrete case and asynchronous fuzzy logic networks in the continuous-valued case. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such dynamical systems within XCSF to solve a number of well-known test problems
    corecore