15 research outputs found

    Contextual Outlier Interpretation

    Full text link
    Outlier detection plays an essential role in many data-driven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches

    Bayesian Robust Tensor Factorization for Incomplete Multiway Data

    Full text link
    We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is modeled by multilinear interactions between multiple latent factors on which the column sparsity is enforced by a hierarchical prior, while the sparse tensor is modeled by a hierarchical view of Student-tt distribution that associates an individual hyperparameter with each element independently. For model learning, we develop an efficient closed-form variational inference under a fully Bayesian treatment, which can effectively prevent the overfitting problem and scales linearly with data size. In contrast to existing related works, our method can perform model selection automatically and implicitly without need of tuning parameters. More specifically, it can discover the groundtruth of CP rank and automatically adapt the sparsity inducing priors to various types of outliers. In addition, the tradeoff between the low-rank approximation and the sparse representation can be optimized in the sense of maximum model evidence. The extensive experiments and comparisons with many state-of-the-art algorithms on both synthetic and real-world datasets demonstrate the superiorities of our method from several perspectives.Comment: in IEEE Transactions on Neural Networks and Learning Systems, 201

    Clutter suppression in ultrasound: performance evaluation and review of low-rank and sparse matrix decomposition methods

    Get PDF
    Vessel diseases are often accompanied by abnormalities related to vascular shape and size. Therefore, a clear visualization of vasculature is of high clinical significance. Ultrasound color flow imaging (CFI) is one of the prominent techniques for flow visualization. However, clutter signals originating from slow-moving tissue are one of the main obstacles to obtain a clear view of the vascular network. Enhancement of the vasculature by suppressing the clutters is a significant and irreplaceable step for many applications of ultrasound CFI. Currently, this task is often performed by singular value decomposition (SVD) of the data matrix. This approach exhibits two well-known limitations. First, the performance of SVD is sensitive to the proper manual selection of the ranks corresponding to clutter and blood subspaces. Second, SVD is prone to failure in the presence of large random noise in the dataset. A potential solution to these issues is using decomposition into low-rank and sparse matrices (DLSM) framework. SVD is one of the algorithms for solving the minimization problem under the DLSM framework. Many other algorithms under DLSM avoid full SVD and use approximated SVD or SVD-free ideas which may have better performance with higher robustness and less computing time. In practice, these models separate blood from clutter based on the assumption that steady clutter represents a low-rank structure and that the moving blood component is sparse. In this paper, we present a comprehensive review of ultrasound clutter suppression techniques and exploit the feasibility of low-rank and sparse decomposition schemes in ultrasound clutter suppression. We conduct this review study by adapting 106 DLSM algorithms and validating them against simulation, phantom, and in vivo rat datasets. Two conventional quality metrics, signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR), are used for performance evaluation. In addition, computation times required by different algorithms for generating clutter suppressed images are reported. Our extensive analysis shows that the DLSM framework can be successfully applied to ultrasound clutter suppression

    Robust Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings

    Get PDF
    ABSTRACT User provided rating data about products and services is one key feature of websites such as Amazon, TripAdvisor, or Yelp. Since these ratings are rather static but might change over time, a temporal analysis of rating distributions provides deeper insights into the evolution of a products' quality. Given a time-series of rating distributions, in this work, we answer the following questions: (1) How to detect the base behavior of users regarding a product's evaluation over time? (2) How to detect points in time where the rating distribution differs from this base behavior, e.g., due to attacks or spontaneous changes in the product's quality? To achieve these goals, we model the base behavior of users regarding a product as a latent multivariate autoregressive process. This latent behavior is mixed with a sparse anomaly signal finally leading to the observed data. We propose an efficient algorithm solving our objective and we present interesting findings on various real world datasets
    corecore