5,873 research outputs found
XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning
A new semi-supervised ensemble algorithm called XGBOD (Extreme Gradient
Boosting Outlier Detection) is proposed, described and demonstrated for the
enhanced detection of outliers from normal observations in various practical
datasets. The proposed framework combines the strengths of both supervised and
unsupervised machine learning methods by creating a hybrid approach that
exploits each of their individual performance capabilities in outlier
detection. XGBOD uses multiple unsupervised outlier mining algorithms to
extract useful representations from the underlying data that augment the
predictive capabilities of an embedded supervised classifier on an improved
feature space. The novel approach is shown to provide superior performance in
comparison to competing individual detectors, the full ensemble and two
existing representation learning based algorithms across seven outlier
datasets.Comment: Proceedings of the 2018 International Joint Conference on Neural
Networks (IJCNN
Recommended from our members
Ensemble learning of model hyperparameters and spatiotemporal data for calibration of low-cost PM2.5 sensors.
he PM2.5 air quality index (AQI) measurements from government-built supersites are accurate but cannot provide a dense coverage of monitoring areas. Low-cost PM2.5 sensors can be used to deploy a fine-grained internet-of-things (IoT) as a complement to government facilities. Calibration of low-cost sensors by reference to high-accuracy supersites is thus essential. Moreover, the imputation for missing-value in training data may affect the calibration result, the best performance of calibration model requires hyperparameter optimization, and the affecting factors of PM2.5 concentrations such as climate, geographical landscapes and anthropogenic activities are uncertain in spatial and temporal dimensions. In this paper, an ensemble learning for imputation method selection, calibration model hyperparameterization, and spatiotemporal training data composition is proposed. Three government supersites are chosen in central Taiwan for the deployment of low-cost sensors and hourly PM2.5 measurements are collected for 60 days for conducting experiments. Three optimizers, Sobol sequence, Nelder and Meads, and particle swarm optimization (PSO), are compared for evaluating their performances with various versions of ensembles. The best calibration results are obtained by using PSO, and the improvement ratios with respect to R2, RMSE, and NME, are 4.92%, 52.96%, and 56.85%, respectively
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
- …