2 research outputs found
Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
© 2016 IEEE. Proper feature selection for unsupervised outlier detection can improve detection performance but is very challenging due to complex feature interactions, the mixture of relevant features with noisy/redundant features in imbalanced data, and the unavailability of class labels. Little work has been done on this challenge. This paper proposes a novel Coupled Unsupervised Feature Selection framework (CUFS for short) to filter out noisy or redundant features for subsequent outlier detection in categorical data. CUFS quantifies the outlierness (or relevance) of features by learning and integrating both the feature value couplings and feature couplings. Such value-To-feature couplings capture intrinsic data characteristics and distinguish relevant features from those noisy/redundant features. CUFS is further instantiated into a parameter-free Dense Subgraph-based Feature Selection method, called DSFS. We prove that DSFS retains a 2-Approximation feature subset to the optimal subset. Extensive evaluation results on 15 real-world data sets show that DSFS obtains an average 48% feature reduction rate, and enables three different types of pattern-based outlier detection methods to achieve substantially better AUC improvements and/or perform orders of magnitude faster than on the original feature set. Compared to its feature selection contender, on average, all three DSFS-based detectors achieve more than 20% AUC improvement
Outlier Detection Ensemble with Embedded Feature Selection
Feature selection places an important role in improving the performance of
outlier detection, especially for noisy data. Existing methods usually perform
feature selection and outlier scoring separately, which would select feature
subsets that may not optimally serve for outlier detection, leading to
unsatisfying performance. In this paper, we propose an outlier detection
ensemble framework with embedded feature selection (ODEFS), to address this
issue. Specifically, for each random sub-sampling based learning component,
ODEFS unifies feature selection and outlier detection into a pairwise ranking
formulation to learn feature subsets that are tailored for the outlier
detection method. Moreover, we adopt the thresholded self-paced learning to
simultaneously optimize feature selection and example selection, which is
helpful to improve the reliability of the training set. After that, we design
an alternate algorithm with proved convergence to solve the resultant
optimization problem. In addition, we analyze the generalization error bound of
the proposed framework, which provides theoretical guarantee on the method and
insightful practical guidance. Comprehensive experimental results on 12
real-world datasets from diverse domains validate the superiority of the
proposed ODEFS.Comment: 10pages, AAAI202