32,742 research outputs found

    Bayesian outlier detection in Capital Asset Pricing Model

    Full text link
    We propose a novel Bayesian optimisation procedure for outlier detection in the Capital Asset Pricing Model. We use a parametric product partition model to robustly estimate the systematic risk of an asset. We assume that the returns follow independent normal distributions and we impose a partition structure on the parameters of interest. The partition structure imposed on the parameters induces a corresponding clustering of the returns. We identify via an optimisation procedure the partition that best separates standard observations from the atypical ones. The methodology is illustrated with reference to a real data set, for which we also provide a microeconomic interpretation of the detected outliers

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Preprocessing Among the Infalling Galaxy Population of EDisCS Clusters

    Get PDF
    We present results from a low-resolution spectroscopic survey for 21 galaxy clusters at 0.4<z<0.80.4 < z < 0.8 selected from the ESO Distant Cluster Survey. We measured spectra using the low-dispersion prism in IMACS on the Magellan Baade telescope and calculate redshifts with an accuracy of σz=0.007\sigma_z = 0.007. We find 1763 galaxies that are brighter than R=22.9R = 22.9 in the large-scale cluster environs. We identify the galaxies expected to be accreted by the clusters as they evolve to z=0z = 0 using spherical infall models and find that 30%\sim30\% to 70%\sim70\% of the z=0z = 0 cluster population lies outside the virial radius at z0.6z \sim 0.6. For analogous clusters at z=0z = 0, we calculate that the ratio of galaxies that have fallen into the clusters since z0.6z \sim 0.6 to those that were already in the core at that redshift is typically between 0.3\sim0.3 and 1.51.5. This wide range of ratios is due to intrinsic scatter and is not a function of velocity dispersion, so a variety of infall histories is to be expected for clusters with current velocity dispersions of 300σ1200300 \lesssim\sigma\lesssim 1200 km s1^{-1}. Within the infall regions of z0.6z \sim 0.6 clusters, we find a larger red fraction of galaxies than in the field and greater clustering among red galaxies than blue. We interpret these findings as evidence of "preprocessing", where galaxies in denser local environments have their star formation rates affected prior to their aggregation into massive clusters, although the possibility of backsplash galaxies complicates the interpretation.Comment: Accepted for publication in Ap

    Distributed Low-rank Subspace Segmentation

    Full text link
    Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data. Low-Rank Representation (LRR), a convex formulation of the subspace segmentation problem, is provably and empirically accurate on small problems but does not scale to the massive sizes of modern vision datasets. Moreover, past work aimed at scaling up low-rank matrix factorization is not applicable to LRR given its non-decomposable constraints. In this work, we propose a novel divide-and-conquer algorithm for large-scale subspace segmentation that can cope with LRR's non-decomposable constraints and maintains LRR's strong recovery guarantees. This has immediate implications for the scalability of subspace segmentation, which we demonstrate on a benchmark face recognition dataset and in simulations. We then introduce novel applications of LRR-based subspace segmentation to large-scale semi-supervised learning for multimedia event detection, concept detection, and image tagging. In each case, we obtain state-of-the-art results and order-of-magnitude speed ups

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework
    corecore