18,920 research outputs found
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Automated novelty detection in the WISE survey with one-class support vector machines
Wide-angle photometric surveys of previously uncharted sky areas or
wavelength regimes will always bring in unexpected sources whose existence and
properties cannot be easily predicted from earlier observations: novelties or
even anomalies. Such objects can be efficiently sought for with novelty
detection algorithms. Here we present an application of such a method, called
one-class support vector machines (OCSVM), to search for anomalous patterns
among sources preselected from the mid-infrared AllWISE catalogue covering the
whole sky. To create a model of expected data we train the algorithm on a set
of objects with spectroscopic identifications from the SDSS DR13 database,
present also in AllWISE. OCSVM detects as anomalous those sources whose
patterns - WISE photometric measurements in this case - are inconsistent with
the model. Among the detected anomalies we find artefacts, such as objects with
spurious photometry due to blending, but most importantly also real sources of
genuine astrophysical interest. Among the latter, OCSVM has identified a sample
of heavily reddened AGN/quasar candidates distributed uniformly over the sky
and in a large part absent from other WISE-based AGN catalogues. It also
allowed us to find a specific group of sources of mixed types, mostly stars and
compact galaxies. By combining the semi-supervised OCSVM algorithm with
standard classification methods it will be possible to improve the latter by
accounting for sources which are not present in the training sample but are
otherwise well-represented in the target set. Anomaly detection adds
flexibility to automated source separation procedures and helps verify the
reliability and representativeness of the training samples. It should be thus
considered as an essential step in supervised classification schemes to ensure
completeness and purity of produced catalogues.Comment: 14 pages, 15 figure
Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes
I argue that data becomes temporarily interesting by itself to some
self-improving, but computationally limited, subjective observer once he learns
to predict or compress the data in a better way, thus making it subjectively
simpler and more beautiful. Curiosity is the desire to create or discover more
non-random, non-arbitrary, regular data that is novel and surprising not in the
traditional sense of Boltzmann and Shannon but in the sense that it allows for
compression progress because its regularity was not yet known. This drive
maximizes interestingness, the first derivative of subjective beauty or
compressibility, that is, the steepness of the learning curve. It motivates
exploring infants, pure mathematicians, composers, artists, dancers, comedians,
yourself, and (since 1990) artificial systems.Comment: 35 pages, 3 figures, based on KES 2008 keynote and ALT 2007 / DS 2007
joint invited lectur
Low-Default Portfolio/One-Class Classification: A Literature Review
Consider a bank which wishes to decide whether a credit applicant will obtain credit or not. The bank has to assess if the applicant will be able to redeem the credit. This is done by estimating the probability that the applicant will default prior to the maturity of the credit. To estimate this probability of default it is first necessary to identify criteria which separate the good from the bad creditors, such as loan amount and age or factors concerning the income of the applicant. The question then arises of how a bank identifies a sufficient number of selective criteria that possess the necessary discriminatory power. As a solution, many traditional binary classification methods have been proposed with varying degrees of success. However, a particular problem with credit scoring is that defaults are only observed for a small subsample of applicants. An imbalance exists between the ratio of non-defaulters to defaulters. This has an adverse effect on the aforementioned binary classification method. Recently one-class classification approaches have been proposed to address the imbalance problem. The purpose of this literature review is three fold: (I) present the reader with an overview of credit scoring; (ii) review existing binary classification approaches; and (iii) introduce and examine one-class classification approaches
What's on your mind? Recent advances in memory detection using the concealed information test
Lie detectors can be applied in a wide variety of settings. But this advantage comes with a considerable cost: False positives. The applicability of the Concealed Information Test (CIT) is More limited, yet when it can be applied, the risk of false accusations can be set a priori at a very low level. The CIT assesses the recognition of; critical information that is known only by the examiners and the culprit, for example, the face a an accomplice. Large effects are Obtained with the CIT, whether combined with peripheral, brain, or Motor responses. We see three important challenges for the CIT. First, the false negative rate Of the CIT can be substantial, particularly under :realistic circumstantes. A possible solution Seems to restrict the CIT to highly Salient details. Second, there exist effective faking strategies. Future research will tell whether faking can be detected or even prevented (e.g., Using Overt measures). Third, recognition of critical crime detail's does not necessarily result from criminal activity. It is therefore important to properly embed the CIT in the investigative process, While taking care when drawing conclusions from the test outcome (recognition, not guilt)
- …