3 research outputs found
Separate and conquer heuristic allows robust mining of contrast sets from various types of data
Identifying differences between groups is one of the most important knowledge
discovery problems. The procedure, also known as contrast sets mining, is
applied in a wide range of areas like medicine, industry, or economics. In the
paper we present RuleKit-CS, an algorithm for contrast set mining based on a
sequential covering - a well established heuristic for decision rule induction.
Multiple passes accompanied with an attribute penalization scheme allow
generating contrast sets describing same examples with different attributes,
unlike the standard sequential covering. The ability to identify contrast sets
in regression and survival data sets, the feature not provided by the existing
algorithms, further extends the usability of RuleKit-CS. Experiments on wide
range of data sets confirmed RuleKit-CS to be a useful tool for discovering
differences between defined groups. The algorithm is a part of the RuleKit
suite available at GitHub under GNU AGPL 3 licence
(https://github.com/adaa-polsl/RuleKit).
Keywords: Contrast sets, Sequential covering, Rule induction, Regression,
Survival, Knowledge discover