15 research outputs found
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
Bayesian Robust Tensor Factorization for Incomplete Multiway Data
We propose a generative model for robust tensor factorization in the presence
of both missing data and outliers. The objective is to explicitly infer the
underlying low-CP-rank tensor capturing the global information and a sparse
tensor capturing the local information (also considered as outliers), thus
providing the robust predictive distribution over missing entries. The
low-CP-rank tensor is modeled by multilinear interactions between multiple
latent factors on which the column sparsity is enforced by a hierarchical
prior, while the sparse tensor is modeled by a hierarchical view of Student-
distribution that associates an individual hyperparameter with each element
independently. For model learning, we develop an efficient closed-form
variational inference under a fully Bayesian treatment, which can effectively
prevent the overfitting problem and scales linearly with data size. In contrast
to existing related works, our method can perform model selection automatically
and implicitly without need of tuning parameters. More specifically, it can
discover the groundtruth of CP rank and automatically adapt the sparsity
inducing priors to various types of outliers. In addition, the tradeoff between
the low-rank approximation and the sparse representation can be optimized in
the sense of maximum model evidence. The extensive experiments and comparisons
with many state-of-the-art algorithms on both synthetic and real-world datasets
demonstrate the superiorities of our method from several perspectives.Comment: in IEEE Transactions on Neural Networks and Learning Systems, 201
Clutter suppression in ultrasound: performance evaluation and review of low-rank and sparse matrix decomposition methods
Vessel diseases are often accompanied by abnormalities related to vascular shape and size. Therefore, a clear visualization of vasculature is of high clinical significance. Ultrasound color flow imaging (CFI) is one of the prominent techniques for flow visualization. However, clutter signals originating from slow-moving tissue are one of the main obstacles to obtain a clear view of the vascular network. Enhancement of the vasculature by suppressing the clutters is a significant and irreplaceable step for many applications of ultrasound CFI. Currently, this task is often performed by singular value decomposition (SVD) of the data matrix. This approach exhibits two well-known limitations. First, the performance of SVD is sensitive to the proper manual selection of the ranks corresponding to clutter and blood subspaces. Second, SVD is prone to failure in the presence of large random noise in the dataset. A potential solution to these issues is using decomposition into low-rank and sparse matrices (DLSM) framework. SVD is one of the algorithms for solving the minimization problem under the DLSM framework. Many other algorithms under DLSM avoid full SVD and use approximated SVD or SVD-free ideas which may have better performance with higher robustness and less computing time. In practice, these models separate blood from clutter based on the assumption that steady clutter represents a low-rank structure and that the moving blood component is sparse. In this paper, we present a comprehensive review of ultrasound clutter suppression techniques and exploit the feasibility of low-rank and sparse decomposition schemes in ultrasound clutter suppression. We conduct this review study by adapting 106 DLSM algorithms and validating them against simulation, phantom, and in vivo rat datasets. Two conventional quality metrics, signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR), are used for performance evaluation. In addition, computation times required by different algorithms for generating clutter suppressed images are reported. Our extensive analysis shows that the DLSM framework can be successfully applied to ultrasound clutter suppression
Robust Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings
ABSTRACT User provided rating data about products and services is one key feature of websites such as Amazon, TripAdvisor, or Yelp. Since these ratings are rather static but might change over time, a temporal analysis of rating distributions provides deeper insights into the evolution of a products' quality. Given a time-series of rating distributions, in this work, we answer the following questions: (1) How to detect the base behavior of users regarding a product's evaluation over time? (2) How to detect points in time where the rating distribution differs from this base behavior, e.g., due to attacks or spontaneous changes in the product's quality? To achieve these goals, we model the base behavior of users regarding a product as a latent multivariate autoregressive process. This latent behavior is mixed with a sparse anomaly signal finally leading to the observed data. We propose an efficient algorithm solving our objective and we present interesting findings on various real world datasets