2,901 research outputs found
Greedy feature construction
We present an effective method for supervised feature construction. The main goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity -- large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. We achieve this goal with a greedy procedure that constructs features by empirically fitting squared error residuals. The proposed constructive procedure is consistent and can output a rich set of features. The effectiveness of the approach is evaluated empirically by fitting a linear ridge regression model in the constructed feature space and our empirical results indicate a superior performance of our approach over competing methods
Feature construction using explanations of individual predictions
Feature construction can contribute to comprehensibility and performance of
machine learning models. Unfortunately, it usually requires exhaustive search
in the attribute space or time-consuming human involvement to generate
meaningful features. We propose a novel heuristic approach for reducing the
search space based on aggregation of instance-based explanations of predictive
models. The proposed Explainable Feature Construction (EFC) methodology
identifies groups of co-occurring attributes exposed by popular explanation
methods, such as IME and SHAP. We empirically show that reducing the search to
these groups significantly reduces the time of feature construction using
logical, relational, Cartesian, numerical, and threshold num-of-N and X-of-N
constructive operators. An analysis on 10 transparent synthetic datasets shows
that EFC effectively identifies informative groups of attributes and constructs
relevant features. Using 30 real-world classification datasets, we show
significant improvements in classification accuracy for several classifiers and
demonstrate the feasibility of the proposed feature construction even for large
datasets. Finally, EFC generated interpretable features on a real-world problem
from the financial industry, which were confirmed by a domain expert.Comment: 54 pages, 10 figures, 22 table
- …