55 research outputs found

    Contextual Outlier Interpretation

    Full text link
    Outlier detection plays an essential role in many data-driven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches

    Event Detection by Feature Unpredictability in Phase-Contrast Videos of Cell Cultures

    Full text link
    Abstract. In this work we propose a novel framework for generic event monitoring in live cell culture videos, built on the assumption that un-predictable observations should correspond to biological events. We use a small set of event-free data to train a multioutput multikernel Gaussian process model that operates as an event predictor by performing autore-gression on a bank of heterogeneous features extracted from consecutive frames of a video sequence. We show that the prediction error of this model can be used as a probability measure of the presence of relevant events, that can enable users to perform further analysis or monitoring of large-scale non-annotated data. We validate our approach in two phase-contrast sequence data sets containing mitosis and apoptosis events: a new private dataset of human bone cancer (osteosarcoma) cells and a benchmark dataset of stem cells

    A Systematic Literature Survey on IDS

    Get PDF
    the significance of system security has grown hugely and various gadgets have been acquainted with enhance the security of a system. Organize interruption recognition frameworks (NIDS) are among the most broadly conveyed such framework. Famous NIDS utilize an accumulation of marks of known security dangers and infections, which are utilized to filter every parcel's payload. Most IDSs do not have the ability to identify novel or beforehand obscure assaults. Major IDSs, called Anomaly Detection Systems, create designs in point of view of traditional structure or structure control, with the objective of distinguishing both seen and covered assaults. Oddity identification frameworks confront numerous problems involving excessive frequency of artificial alert, capacity to call in online mode, and flexibility. This paper introduces a particular overview of incremental methodologies for distinguishing oddity in ordinary framework and system movement

    Exploring Online Novelty Detection Using First Story Detection Models

    Get PDF
    Online novelty detection is an important technology in understanding and exploiting streaming data. One application of online novelty detection is First Story Detection (FSD) which attempts to find the very first story about a new topic, e.g. the first news report discussing the “Beast from the East” hitting Ireland. Although hundreds of FSD models have been developed, the vast majority of these only aim at improving the performance of the detection for some specific dataset, and very few focus on the insight of novelty itself. We believe that online novelty detection, framed as an unsupervised learning problem, always requires a clear definition of novelty. Indeed, we argue the definition of novelty is the key issue in designing a good detection model. Within the context of FSD, we first categorise online novelty detection models into three main categories, based on different definitions of novelty scores, and then compare the performances of these model categories in different features spaces. Our experimental results show that the challenge of FSD varies across novelty scores (and corresponding model categories); and, furthermore, that the detection of novelty in the very popular Word2Vec feature space is more difficult than in a normal frequency-based feature space because of a loss of word specificity

    Energy Consumption Data Based Machine Anomaly Detection

    Get PDF

    Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

    Get PDF
    First Story Detection (FSD) requires a system to detect the very first story that mentions an event from a stream of stories. Nearest neighbour-based models, using the traditional term vector document representations like TF-IDF, currently achieve the state of the art in FSD. Because of its online nature, a dynamic term vector model that is incrementally updated during the detection process is usually adopted for FSD instead of a static model. However, very little research has investigated the selection of hyper-parameters and the background corpora for a dynamic model. In this paper, we analyse how a dynamic term vector model works for FSD, and investigate the impact of different update frequencies and background corpora on FSD performance. Our results show that dynamic models with high update frequencies outperform static model and dynamic models with low update frequencies; and that the FSD performance of dynamic models does not always increase with higher update frequencies, but instead reaches steady state after some update frequency threshold is reached. In addition, we demonstrate that different background corpora have very limited influence on the dynamic models with high update frequencies in terms of FSD performance
    corecore