9,850 research outputs found

    Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features

    Full text link
    One-class support vector machine (OC-SVM) for a long time has been one of the most effective anomaly detection methods and extensively adopted in both research as well as industrial applications. The biggest issue for OC-SVM is yet the capability to operate with large and high-dimensional datasets due to optimization complexity. Those problems might be mitigated via dimensionality reduction techniques such as manifold learning or autoencoder. However, previous work often treats representation learning and anomaly prediction separately. In this paper, we propose autoencoder based one-class support vector machine (AE-1SVM) that brings OC-SVM, with the aid of random Fourier features to approximate the radial basis kernel, into deep learning context by combining it with a representation learning architecture and jointly exploit stochastic gradient descent to obtain end-to-end training. Interestingly, this also opens up the possible use of gradient-based attribution methods to explain the decision making for anomaly detection, which has ever been challenging as a result of the implicit mappings between the input space and the kernel space. To the best of our knowledge, this is the first work to study the interpretability of deep learning in anomaly detection. We evaluate our method on a wide range of unsupervised anomaly detection tasks in which our end-to-end training architecture achieves a performance significantly better than the previous work using separate training.Comment: Accepted at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 201

    DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

    Full text link
    Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples - normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the `edge' of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. We validate our method by demonstrating consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International Conference on Data Mining

    Unsupervised routine discovery in egocentric photo-streams

    Full text link
    The routine of a person is defined by the occurrence of activities throughout different days, and can directly affect the person's health. In this work, we address the recognition of routine related days. To do so, we rely on egocentric images, which are recorded by a wearable camera and allow to monitor the life of the user from a first-person view perspective. We propose an unsupervised model that identifies routine related days, following an outlier detection approach. We test the proposed framework over a total of 72 days in the form of photo-streams covering around 2 weeks of the life of 5 different camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted F-Score for all the users. Thus, we show that our framework is able to recognise routine related days and opens the door to the understanding of the behaviour of people

    Autoencoders for strategic decision support

    Full text link
    In the majority of executive domains, a notion of normality is involved in most strategic decisions. However, few data-driven tools that support strategic decision-making are available. We introduce and extend the use of autoencoders to provide strategically relevant granular feedback. A first experiment indicates that experts are inconsistent in their decision making, highlighting the need for strategic decision support. Furthermore, using two large industry-provided human resources datasets, the proposed solution is evaluated in terms of ranking accuracy, synergy with human experts, and dimension-level feedback. This three-point scheme is validated using (a) synthetic data, (b) the perspective of data quality, (c) blind expert validation, and (d) transparent expert evaluation. Our study confirms several principal weaknesses of human decision-making and stresses the importance of synergy between a model and humans. Moreover, unsupervised learning and in particular the autoencoder are shown to be valuable tools for strategic decision-making
    • …
    corecore