39 research outputs found
A Flexible Framework for Anomaly Detection via Dimensionality Reduction
Anomaly detection is challenging, especially for large datasets in high
dimensions. Here we explore a general anomaly detection framework based on
dimensionality reduction and unsupervised clustering. We release DRAMA, a
general python package that implements the general framework with a wide range
of built-in options. We test DRAMA on a wide variety of simulated and real
datasets, in up to 3000 dimensions, and find it robust and highly competitive
with commonly-used anomaly detection algorithms, especially in high dimensions.
The flexibility of the DRAMA framework allows for significant optimization once
some examples of anomalies are available, making it ideal for online anomaly
detection, active learning and highly unbalanced datasets.Comment: 6 page
Fast Approximate Geodesics for Deep Generative Models
The length of the geodesic between two data points along a Riemannian
manifold, induced by a deep generative model, yields a principled measure of
similarity. Current approaches are limited to low-dimensional latent spaces,
due to the computational complexity of solving a non-convex optimisation
problem. We propose finding shortest paths in a finite graph of samples from
the aggregate approximate posterior, that can be solved exactly, at greatly
reduced runtime, and without a notable loss in quality. Our approach,
therefore, is hence applicable to high-dimensional problems, e.g., in the
visual domain. We validate our approach empirically on a series of experiments
using variational autoencoders applied to image data, including the Chair,
FashionMNIST, and human movement data sets.Comment: 28th International Conference on Artificial Neural Networks, 201
A Fuzzy Clustering Algorithm for High Dimensional Streaming Data
In this paper we propose a dimension reduced weighted fuzzy clustering algorithm (sWFCM-HD). The algorithm can be used for high dimensional datasets having streaming behavior. Such datasets can be found in the area of sensor networks, data originated from web click stream and data collected by internet traffic flow etc. These data’s have two special properties which separate them from other datasets: a) They have streaming behavior and b) They have higher dimensions. Optimized fuzzy clustering algorithm has already been proposed for datasets having streaming behavior or higher dimensions. But as per our information, nobody has proposed any optimized fuzzy clustering algorithm for data sets having both the properties, i.e., data sets with higher dimension and also continuously arriving streaming behavior. Experimental analysis shows that our proposed algorithm (sWFCM-HD) improves performance in terms of memory consumption as well as execution time Keywords-K-Means, Fuzzy C-Means, Weighted Fuzzy C-Means, Dimension Reduction, Clustering