4 research outputs found
An Efficient Genetic Algorithm for Discovering Diverse-Frequent Patterns
Working with exhaustive search on large dataset is infeasible for several
reasons. Recently, developed techniques that made pattern set mining feasible
by a general solver with long execution time that supports heuristic search and
are limited to small datasets only. In this paper, we investigate an approach
which aims to find diverse set of patterns using genetic algorithm to mine
diverse frequent patterns. We propose a fast heuristic search algorithm that
outperforms state-of-the-art methods on a standard set of benchmarks and
capable to produce satisfactory results within a short period of time. Our
proposed algorithm uses a relative encoding scheme for the patterns and an
effective twin removal technique to ensure diversity throughout the search.Comment: 2015 International Conference on Electrical Engineering and
Information Communication Technology (ICEEICT
Optimizing feature sets for structured data
Choosing a suitable feature representation for structured data is a non-trivial task due to the vast number of potential candidates. Ideally, one would like to pick a small, but informative set of structural features, each providing complementary information about the instances. We frame the search for a suitable feature set as a combinatorial optimization problem. For this purpose, we define a scoring function that favors features that are as dissimilar as possible to all other features. The score is used in a stochastic local search (SLS) procedure to maximize the diversity of a feature set. In experiments on small molecule data, we investigate the effectiveness of a forward selection approach with two different linear classification schemes