1,757 research outputs found
Unsupervised Anomaly Detection of High Dimensional Data with Low Dimensional Embedded Manifold
Anomaly detection techniques are supposed to identify anomalies from loads of seemingly homogeneous data and being able to do so can lead us to timely, pivotal and actionable decisions, saving us from potential human, financial and informational loss. In anomaly detection, an often encountered situation is the absence of prior knowledge about the nature of anomalies. Such circumstances advocate for ‘unsupervised’ learning-based anomaly detection techniques. Compared to its ‘supervised’ counterpart, which possesses the luxury to utilize a labeled training dataset containing both normal and anomalous samples, unsupervised problems are far more difficult. Moreover, high dimensional streaming data from tons of interconnected sensors present in modern day industries makes the task more challenging. To carry out an investigative effort to address these challenges is the overarching theme of this dissertation.
In this dissertation, the fundamental issue of similarity measure among observations, which is a central piece in any anomaly detection techniques, is reassessed. Manifold hypotheses suggests the possibility of low dimensional manifold structure embedded in high dimensional data. In the presence of such structured space, traditional similarity measures fail to measure the true intrinsic similarity. In light of this revelation, reevaluating the notion of similarity measure seems more pressing rather than providing incremental improvements over any of the existing techniques. A graph theoretic similarity measure is proposed to differentiate and thus identify the anomalies from normal observations. Specifically, the minimum spanning tree (MST), a graph-based approach is proposed to approximate the similarities among data points in the presence of high dimensional structured space. It can track the structure of the embedded manifold better than the existing measures and help to distinguish the anomalies from normal observations. This dissertation investigates further three different aspects of the anomaly detection problem and develops three sets of solution approaches with all of them revolving around the newly proposed MST based similarity measure.
In the first part of the dissertation, a local MST (LoMST) based anomaly detection approach is proposed to detect anomalies using the data in the original space. A two-step procedure is developed to detect both cluster and point anomalies. The next two sets of methods are proposed in the subsequent two parts of the dissertation, for anomaly detection in reduced data space. In the second part of the dissertation, a neighborhood structure assisted version of the nonnegative matrix factorization approach (NS-NMF) is proposed. To detect anomalies, it uses the neighborhood information captured by a sparse MST similarity matrix along with the original attribute information. To meet the industry demands, the online version of both LoMST and NS-NMF is also developed for real-time anomaly detection. In the last part of the dissertation, a graph regularized autoencoder is proposed which uses an MST regularizer in addition to the original loss function and is thus capable of maintaining the local invariance property. All of the approaches proposed in the dissertation are tested on 20 benchmark datasets and one real-life hydropower dataset. When compared with the state of art approaches, all three approaches produce statistically significant better outcomes.
“Industry 4.0” is a reality now and it calls for anomaly detection techniques capable of processing a large amount of high dimensional data generated in real-time. The proposed MST based similarity measure followed by the individual techniques developed in this dissertation are equipped to tackle each of these issues and provide an effective and reliable real-time anomaly identification platform
Shakeout: A New Approach to Regularized Deep Neural Network Training
Recent years have witnessed the success of deep neural networks in dealing
with a plenty of practical problems. Dropout has played an essential role in
many successful deep neural networks, by inducing regularization in the model
training. In this paper, we present a new regularized training approach:
Shakeout. Instead of randomly discarding units as Dropout does at the training
stage, Shakeout randomly chooses to enhance or reverse each unit's contribution
to the next layer. This minor modification of Dropout has the statistical
trait: the regularizer induced by Shakeout adaptively combines , and
regularization terms. Our classification experiments with representative
deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that
Shakeout deals with over-fitting effectively and outperforms Dropout. We
empirically demonstrate that Shakeout leads to sparser weights under both
unsupervised and supervised settings. Shakeout also leads to the grouping
effect of the input units in a layer. Considering the weights in reflecting the
importance of connections, Shakeout is superior to Dropout, which is valuable
for the deep model compression. Moreover, we demonstrate that Shakeout can
effectively reduce the instability of the training process of the deep
architecture.Comment: Appears at T-PAMI 201
Recommended from our members
Sparsity in Machine Learning: An Information Selecting Perspective
Today we are living in a world awash with data. Large volumes of data are acquired, analyzed and applied to tasks through machine learning algorithms in nearly every area of science, business, and industry. For example, medical scientists analyze the gene expression data from a single specimen to learn the underlying causes of disease (e.g. cancer) and choose the best treatment; retailers can know more about customers\u27 shopping habits from retail data to adjust their business strategies to better appeal to customers; suppliers can enhance supply chain success through supply chain systems built on knowledge sharing. However, it is also reasonable to doubt whether all the genes make contributions to a disease; whether all the data obtained from existing customers can be applied to a new customer; whether all shared knowledge in the supply network is useful to a specific supply scenario. Therefore, it is crucial to sort through the massive information provided by data and keep what we really need. This process is referred to as information selection, which keeps the information that helps improve the performance of corresponding machine learning tasks and discards information that is useless or even harmful to task performance. Sparse learning is a powerful tool to achieve information selection. In this thesis, we apply sparse learning to two major areas in machine learning -- feature selection and transfer learning.
Feature selection is a dimensionality reduction technique that selects a subset of representative features. Recently, feature selection combined with sparse learning has attracted significant attention due to its outstanding performance compared with traditional feature selection methods that ignore correlation between features. However, they are restricted by design to linear data transformations, a potential drawback given that the underlying correlation structures of data are often non-linear. To leverage more sophisticated embedding than the linear model assumed by sparse learning, we propose an autoencoder-based unsupervised feature selection approach that leverages a single-layer autoencoder for a joint framework of feature selection and manifold learning. Additionally, we include spectral graph analysis on the projected data into the learning process to achieve local data geometry preservation from the original data space to the low-dimensional feature space.
Transfer learning describes a set of methods that aim at transferring knowledge from related domains to alleviate the problems caused by limited/no labeled training data in machine learnig tasks. Many transfer learning techniques have been proposed to deal with different application scenarios. However, due to the differences in data distribution, feature space, label space, etc., between source domain and target domain, it is necessary to select and only transfer relevant information from source domain to improve the performance of target learner. Otherwise, the target learner can be negatively impacted by the weak-related knowledge from source domain, which is referred to as negative transfer. In this thesis, we focus on two transfer learning scenarios for which limited labeled training data are available in target domain. In the first scenario, no label information is avaible in source data. In the second scenario, large amounts of labeled source data are available, but there is no overlap between the source and target label spaces. The corresponding transfer learning technique to the former case is called \emph{self-taught learning}, while that for the latter case is called \emph{few-shot learning}. We apply self-taught learning to visual, textal, and audio data. We also apply few-shot learning to wearable sensor based human activity data. For both cases, we propose a metric for the relevance between a target sample/class and a source sample/class, and then extract information from the related samples/classes for knowledge transfer to perform information selection so that negative transfer caused by weakly related source information can be alleviated. Experimental results show that transfer learning can provide better performance with information selection
- …