16,332 research outputs found
Learning Expressive Linkage Rules for Entity Matching using Genetic Programming
A central problem in data integration and data cleansing is to identify
pairs of entities in data sets that describe the same real-world object.
Many existing methods for matching entities rely on explicit linkage rules,
which specify how two entities are compared for equivalence. Unfortunately,
writing accurate linkage rules by hand is a non-trivial problem that
requires detailed knowledge of the involved data sets. Another important
issue is the efficient execution of linkage rules.
In this thesis, we propose a set of novel methods that cover the complete
entity matching workflow from the generation of linkage rules using genetic
programming algorithms to their efficient execution on distributed systems.
First, we propose a supervised learning algorithm that is capable of
generating linkage rules from a gold standard consisting of set of entity
pairs that have been labeled as duplicates or non-duplicates. We show that
the introduced algorithm outperforms previously proposed entity matching
approaches including the state-of-the-art genetic programming approach by
de Carvalho et al. and is capable of learning linkage rules that achieve a
similar accuracy than the human written rule for the same problem.
In order to also cover use cases for which no gold standard is available,
we propose a complementary active learning algorithm that generates a gold
standard interactively by asking the user to confirm or decline the
equivalence of a small number of entity pairs. In the experimental
evaluation, labeling at most 50 link candidates was necessary in order to
match the performance that is achieved by the supervised GenLink algorithm
on the entire gold standard.
Finally, we propose an efficient execution workflow that can be run on
cluster of multiple machines. The execution workflow employs a novel
multidimensional indexing method that allows the efficient execution of
learned linkage rules by reducing the number of required comparisons
significantly
An incremental approach to genetic algorithms based classification
Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multi-agent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an “integration” operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed
Incremental multiple objective genetic algorithms
This paper presents a new genetic algorithm approach to multi-objective optimization problemsIncremental Multiple Objective Genetic Algorithms (IMOGA). Different from conventional MOGA methods, it takes each objective into consideration incrementally. The whole evolution is divided into as many phases as the number of objectives, and one more objective is considered in each phase. Each phase is composed of two stages: first, an independent population is evolved to optimize one specific objective; second, the better-performing individuals from the evolved single-objective population and the multi-objective population evolved in the last phase are joined together by the operation of integration. The resulting population then becomes an initial multi-objective population, to which a multi-objective evolution based on the incremented objective set is applied. The experiment results show that, in most problems, the performance of IMOGA is better than that of three other MOGAs, NSGA-II, SPEA and PAES. IMOGA can find more solutions during the same time span, and the quality of solutions is better
Cooperative co-evolution of GA-based classifiers based on input increments
Genetic algorithms (GAs) have been widely used as soft computing techniques in various
applications, while cooperative co-evolution algorithms were proposed in the literature to improve the
performance of basic GAs. In this paper, a new cooperative co-evolution algorithm, namely ECCGA, is
proposed in the application domain of pattern classification. Concurrent local and global evolution and
conclusive global evolution are proposed to improve further the classification performance. Different
approaches of ECCGA are evaluated on benchmark classification data sets, and the results show that
ECCGA can achieve better performance than the cooperative co-evolution genetic algorithm and normal GA.
Some analysis and discussions on ECCGA and possible improvement are also presented
Rule Extraction by Genetic Programming with Clustered Terminal Symbols
When Genetic Programming (GP) is applied to rule extraction from databases, the attributes of the data are often used for the terminal symbols. However, in the case of the database with a large number of attributes, the search space becomes vast because the size of the terminal set increases. As a result, the search performance declines. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by reducing the number of terminal symbols, the rough and rapid search is performed. In the latter stage of
search, by using the original attributes for terminal symbols, the local search is performed. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate classification rules
Gray Image extraction using Fuzzy Logic
Fuzzy systems concern fundamental methodology to represent and process
uncertainty and imprecision in the linguistic information. The fuzzy systems
that use fuzzy rules to represent the domain knowledge of the problem are known
as Fuzzy Rule Base Systems (FRBS). On the other hand image segmentation and
subsequent extraction from a noise-affected background, with the help of
various soft computing methods, are relatively new and quite popular due to
various reasons. These methods include various Artificial Neural Network (ANN)
models (primarily supervised in nature), Genetic Algorithm (GA) based
techniques, intensity histogram based methods etc. providing an extraction
solution working in unsupervised mode happens to be even more interesting
problem. Literature suggests that effort in this respect appears to be quite
rudimentary. In the present article, we propose a fuzzy rule guided novel
technique that is functional devoid of any external intervention during
execution. Experimental results suggest that this approach is an efficient one
in comparison to different other techniques extensively addressed in
literature. In order to justify the supremacy of performance of our proposed
technique in respect of its competitors, we take recourse to effective metrics
like Mean Squared Error (MSE), Mean Absolute Error (MAE), Peak Signal to Noise
Ratio (PSNR).Comment: 8 pages, 5 figures, Fuzzy Rule Base, Image Extraction, Fuzzy
Inference System (FIS), Membership Functions, Membership values,Image coding
and Processing, Soft Computing, Computer Vision Accepted and published in
IEEE. arXiv admin note: text overlap with arXiv:1206.363
Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system
A number of representation schemes have been presented for use within
learning classifier systems, ranging from binary encodings to neural networks.
This paper presents results from an investigation into using discrete and fuzzy
dynamical system representations within the XCSF learning classifier system. In
particular, asynchronous random Boolean networks are used to represent the
traditional condition-action production system rules in the discrete case and
asynchronous fuzzy logic networks in the continuous-valued case. It is shown
possible to use self-adaptive, open-ended evolution to design an ensemble of
such dynamical systems within XCSF to solve a number of well-known test
problems
- …