4,965 research outputs found
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
Semi-supervised Embedding in Attributed Networks with Outliers
In this paper, we propose a novel framework, called Semi-supervised Embedding
in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector
representation that systematically captures the topological proximity,
attribute affinity and label similarity of vertices in a partially labeled
attributed network (PLAN). Our method is designed to work in both transductive
and inductive settings while explicitly alleviating noise effects from
outliers. Experimental results on various datasets drawn from the web, text and
image domains demonstrate the advantages of SEANO over state-of-the-art methods
in semi-supervised classification under transductive as well as inductive
settings. We also show that a subset of parameters in SEANO is interpretable as
outlier score and can significantly outperform baseline methods when applied
for detecting network outliers. Finally, we present the use of SEANO in a
challenging real-world setting -- flood mapping of satellite images and show
that it is able to outperform modern remote sensing algorithms for this task.Comment: in Proceedings of SIAM International Conference on Data Mining
(SDM'18
Automated novelty detection in the WISE survey with one-class support vector machines
Wide-angle photometric surveys of previously uncharted sky areas or
wavelength regimes will always bring in unexpected sources whose existence and
properties cannot be easily predicted from earlier observations: novelties or
even anomalies. Such objects can be efficiently sought for with novelty
detection algorithms. Here we present an application of such a method, called
one-class support vector machines (OCSVM), to search for anomalous patterns
among sources preselected from the mid-infrared AllWISE catalogue covering the
whole sky. To create a model of expected data we train the algorithm on a set
of objects with spectroscopic identifications from the SDSS DR13 database,
present also in AllWISE. OCSVM detects as anomalous those sources whose
patterns - WISE photometric measurements in this case - are inconsistent with
the model. Among the detected anomalies we find artefacts, such as objects with
spurious photometry due to blending, but most importantly also real sources of
genuine astrophysical interest. Among the latter, OCSVM has identified a sample
of heavily reddened AGN/quasar candidates distributed uniformly over the sky
and in a large part absent from other WISE-based AGN catalogues. It also
allowed us to find a specific group of sources of mixed types, mostly stars and
compact galaxies. By combining the semi-supervised OCSVM algorithm with
standard classification methods it will be possible to improve the latter by
accounting for sources which are not present in the training sample but are
otherwise well-represented in the target set. Anomaly detection adds
flexibility to automated source separation procedures and helps verify the
reliability and representativeness of the training samples. It should be thus
considered as an essential step in supervised classification schemes to ensure
completeness and purity of produced catalogues.Comment: 14 pages, 15 figure
- …