32,742 research outputs found
Bayesian outlier detection in Capital Asset Pricing Model
We propose a novel Bayesian optimisation procedure for outlier detection in
the Capital Asset Pricing Model. We use a parametric product partition model to
robustly estimate the systematic risk of an asset. We assume that the returns
follow independent normal distributions and we impose a partition structure on
the parameters of interest. The partition structure imposed on the parameters
induces a corresponding clustering of the returns. We identify via an
optimisation procedure the partition that best separates standard observations
from the atypical ones. The methodology is illustrated with reference to a real
data set, for which we also provide a microeconomic interpretation of the
detected outliers
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Preprocessing Among the Infalling Galaxy Population of EDisCS Clusters
We present results from a low-resolution spectroscopic survey for 21 galaxy
clusters at selected from the ESO Distant Cluster Survey. We
measured spectra using the low-dispersion prism in IMACS on the Magellan Baade
telescope and calculate redshifts with an accuracy of . We
find 1763 galaxies that are brighter than in the large-scale cluster
environs. We identify the galaxies expected to be accreted by the clusters as
they evolve to using spherical infall models and find that
to of the cluster population lies outside the virial radius
at . For analogous clusters at , we calculate that the ratio
of galaxies that have fallen into the clusters since to those that
were already in the core at that redshift is typically between and
. This wide range of ratios is due to intrinsic scatter and is not a
function of velocity dispersion, so a variety of infall histories is to be
expected for clusters with current velocity dispersions of km s. Within the infall regions of clusters, we find a larger red fraction of galaxies than in the field and
greater clustering among red galaxies than blue. We interpret these findings as
evidence of "preprocessing", where galaxies in denser local environments have
their star formation rates affected prior to their aggregation into massive
clusters, although the possibility of backsplash galaxies complicates the
interpretation.Comment: Accepted for publication in Ap
Distributed Low-rank Subspace Segmentation
Vision problems ranging from image clustering to motion segmentation to
semi-supervised learning can naturally be framed as subspace segmentation
problems, in which one aims to recover multiple low-dimensional subspaces from
noisy and corrupted input data. Low-Rank Representation (LRR), a convex
formulation of the subspace segmentation problem, is provably and empirically
accurate on small problems but does not scale to the massive sizes of modern
vision datasets. Moreover, past work aimed at scaling up low-rank matrix
factorization is not applicable to LRR given its non-decomposable constraints.
In this work, we propose a novel divide-and-conquer algorithm for large-scale
subspace segmentation that can cope with LRR's non-decomposable constraints and
maintains LRR's strong recovery guarantees. This has immediate implications for
the scalability of subspace segmentation, which we demonstrate on a benchmark
face recognition dataset and in simulations. We then introduce novel
applications of LRR-based subspace segmentation to large-scale semi-supervised
learning for multimedia event detection, concept detection, and image tagging.
In each case, we obtain state-of-the-art results and order-of-magnitude speed
ups
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
- …