77,750 research outputs found
Robust and cost-effective approach for discovering action rules
The main goal of Knowledge Discovery in
Databases is to find interesting and usable patterns, meaningful
in their domain. Actionable Knowledge Discovery came to
existence as a direct respond to the need of finding more usable
patterns called actionable patterns. Traditional data mining
and algorithms are often confined to deliver frequent patterns
and come short for suggesting how to make these patterns
actionable. In this scenario the users are expected to act.
However, the users are not advised about what to do with
delivered patterns in order to make them usable. In this paper,
we present an automated approach to focus on not only creating
rules but also making the discovered rules actionable.
Up to now few works have been reported in this field which
lacking incomprehensibility to the user, overlooking the cost
and not providing rule generality. Here we attempt to present a
method to resolving these issues. In this paper CEARDM
method is proposed to discover cost-effective action rules from
data. These rules offer some cost-effective changes to
transferring low profitable instances to higher profitable ones.
We also propose an idea for improving in CEARDM method
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Recently, because of increasing amount of data in the society, data stream mining targeting large scale data has
attracted attention. The data mining is a technology of discovery new knowledge and patterns from the massive amounts of data, and what the data correspond to data stream is data stream mining. In this paper, we propose the feature selection with online decision tree. At first, we construct online type decision tree to regard credit card transaction data as data stream on data stream
mining. At second, we select attributes thought to be important for detection of illegal use. We apply VFDT (Very Fast Decision Tree learner) algorithm to online type decision tree construction
A System for Induction of Oblique Decision Trees
This article describes a new system for induction of oblique decision trees.
This system, OC1, combines deterministic hill-climbing with two forms of
randomization to find a good oblique split (in the form of a hyperplane) at
each node of a decision tree. Oblique decision tree methods are tuned
especially for domains in which the attributes are numeric, although they can
be adapted to symbolic or mixed symbolic/numeric attributes. We present
extensive empirical studies, using both real and artificial data, that analyze
OC1's ability to construct oblique trees that are smaller and more accurate
than their axis-parallel counterparts. We also examine the benefits of
randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Achieving non-discrimination in prediction
Discrimination-aware classification is receiving an increasing attention in
data science fields. The pre-process methods for constructing a
discrimination-free classifier first remove discrimination from the training
data, and then learn the classifier from the cleaned data. However, they lack a
theoretical guarantee for the potential discrimination when the classifier is
deployed for prediction. In this paper, we fill this gap by mathematically
bounding the probability of the discrimination in prediction being within a
given interval in terms of the training data and classifier. We adopt the
causal model for modeling the data generation mechanism, and formally defining
discrimination in population, in a dataset, and in prediction. We obtain two
important theoretical results: (1) the discrimination in prediction can still
exist even if the discrimination in the training data is completely removed;
and (2) not all pre-process methods can ensure non-discrimination in prediction
even though they can achieve non-discrimination in the modified training data.
Based on the results, we develop a two-phase framework for constructing a
discrimination-free classifier with a theoretical guarantee. The experiments
demonstrate the theoretical results and show the effectiveness of our two-phase
framework
Alternating model trees
Model tree induction is a popular method for tackling regression problems requiring interpretable models. Model trees are decision trees with multiple linear regression models at the leaf nodes. In this paper, we propose a method for growing alternating model trees, a form of option tree for regression problems. The motivation is that alternating decision trees achieve high accuracy in classification problems because they represent an ensemble classifier as a single tree structure. As in alternating decision trees for classifi-cation, our alternating model trees for regression contain splitter and prediction nodes, but we use simple linear regression functions as opposed to constant predictors at the prediction nodes. Moreover, additive regression using forward stagewise modeling is applied to grow the tree rather than a boosting algorithm. The size of the tree is determined using cross-validation. Our empirical results show that alternating model trees achieve significantly lower squared error than standard model trees on several regression datasets
Relatedness Measures to Aid the Transfer of Building Blocks among Multiple Tasks
Multitask Learning is a learning paradigm that deals with multiple different
tasks in parallel and transfers knowledge among them. XOF, a Learning
Classifier System using tree-based programs to encode building blocks
(meta-features), constructs and collects features with rich discriminative
information for classification tasks in an observed list. This paper seeks to
facilitate the automation of feature transferring in between tasks by utilising
the observed list. We hypothesise that the best discriminative features of a
classification task carry its characteristics. Therefore, the relatedness
between any two tasks can be estimated by comparing their most appropriate
patterns. We propose a multiple-XOF system, called mXOF, that can dynamically
adapt feature transfer among XOFs. This system utilises the observed list to
estimate the task relatedness. This method enables the automation of
transferring features. In terms of knowledge discovery, the resemblance
estimation provides insightful relations among multiple data. We experimented
mXOF on various scenarios, e.g. representative Hierarchical Boolean problems,
classification of distinct classes in the UCI Zoo dataset, and unrelated tasks,
to validate its abilities of automatic knowledge-transfer and estimating task
relatedness. Results show that mXOF can estimate the relatedness reasonably
between multiple tasks to aid the learning performance with the dynamic feature
transferring.Comment: accepted by The Genetic and Evolutionary Computation Conference
(GECCO 2020
- …