27 research outputs found

    A Novel Proposal for Outlier Detection in High Dimensional Space

    No full text

    An adaptive classification framework for unsupervised model updating in nonstationary environments

    No full text
    This paper introduces an adaptive framework that makes use of ensemble classification and self-training to maintain high classification performance in datasets affected by concept drift without the aid of external supervision to update the model of a classifier. The updating of the model of the framework is triggered by a mechanism that infers the presence of concept drift based on the analysis of the differences between the outputs of the different classifiers. In order to evaluate the performance of the proposed algorithm, comparisons were made with a set of unsupervised classification techniques and drift detection techniques. The results show that the framework is able to react more promptly to performance degradation than the existing methods and this leads to increased classification accuracy. In addition, the framework stores a smaller amount of instances with respect to a single-classifier approach.</p

    Detection of cross-channel anomalies

    Full text link
    The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis
    corecore