10 research outputs found

    New Approaches in Multi-View Clustering

    Get PDF
    Many real-world datasets can be naturally described by multiple views. Due to this, multi-view learning has drawn much attention from both academia and industry. Compared to single-view learning, multi-view learning has demonstrated plenty of advantages. Clustering has long been serving as a critical technique in data mining and machine learning. Recently, multi-view clustering has achieved great success in various applications. To provide a comprehensive review of the typical multi-view clustering methods and their corresponding recent developments, this chapter summarizes five kinds of popular clustering methods and their multi-view learning versions, which include k-means, spectral clustering, matrix factorization, tensor decomposition, and deep learning. These clustering methods are the most widely employed algorithms for single-view data, and lots of efforts have been devoted to extending them for multi-view clustering. Besides, many other multi-view clustering methods can be unified into the frameworks of these five methods. To promote further research and development of multi-view clustering, some popular and open datasets are summarized in two categories. Furthermore, several open issues that deserve more exploration are pointed out in the end

    Multi-view shaker detection: Insights from a noise-immune influence analysis Perspective

    Full text link
    Entities whose changes will significantly affect others in a networked system are called shakers. In recent years, some models have been proposed to detect such shaker from evolving entities. However, limited work has focused on shaker detection in very short term, which has many real-world applications. For example, in financial market, it can enable both investors and governors to quickly respond to rapid changes. Under the short-term setting, conventional methods may suffer from limited data sample problems and are sensitive to cynical manipulations, leading to unreliable results. Fortunately, there are multi-attribute evolution records available, which can provide compatible and complementary information. In this paper, we investigate how to learn reliable influence results from the short-term multi-attribute evolution records. We call entities with consistent influence among different views in short term as multi-view shakers and study the new problem of multi-view shaker detection. We identify the challenges as follows: (1) how to jointly detect short-term shakers and model conflicting influence results among different views? (2) how to filter spurious influence relation in each individual view for robust influence inference? In response, a novel solution, called Robust Influence Network from a noise-immune influence analysis perspective is proposed, where the possible outliers are well modelled jointly with multi-view shaker detection task. More specifically, we learn the influence relation from each view and transform influence relation from different views into an intermediate representation. In the meantime, we uncover both the inconsistent and spurious outliers.Comment: 14 pages, 4 figure

    Semi-supervised Variational Multi-view Anomaly Detection

    Full text link
    Multi-view anomaly detection (Multi-view AD) is a challenging problem due to the inconsistent behaviors across multiple views. Meanwhile, learning useful representations with little or no supervision has attracted much attention in machine learning. There are a large amount of recent advances in representation learning focusing on deep generative models, such as Variational Auto Encoder (VAE). In this study, by utilizing the representation learning ability of VAE and manipulating the latent variables properly, we propose a novel Bayesian generative model as a semi-supervised multi-view anomaly detector, called MultiVAE. We conduct experiments to evaluate the performance of MultiVAE on multi-view data. The experimental results demonstrate that MultiVAE outperforms the state-of-the-art competitors across popular datasets for semi-supervised multi-view AD. As far as we know, this is the first work that applies VAE-based deep models on multi-view AD

    A Flexible Outlier Detector Based on a Topology Given by Graph Communities

    Get PDF
    Acord transformatiu CRUE-CSICOutlier detection is essential for optimal performance of machine learning methods and statistical predictive models. Their detection is especially determinant in small sample size unbalanced problems, since in such settings outliers become highly influential and significantly bias models. This particular experimental settings are usual in medical applications, like diagnosis of rare pathologies, outcome of experimental personalized treatments or pandemic emergencies. In contrast to population-based methods, neighborhood based local approaches compute an outlier score from the neighbors of each sample, are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. A main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters, like the number of neighbors. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world and synthetic data sets show that our approach outperforms, both, local and global strategies in multi and single view settings

    Cross-aligned and Gumbel-refactored Autoencoders for Multi-view Anomaly Detection

    Full text link
    Multi-view anomaly detection (AD) is a challenging task due to the complicated data distributions across different views. Specifically, there exist two types of anomalies in multi-view distributions: attribute anomaly that exhibits consistent anomalous pattern in each view and class anomaly that exhibits inconsistent traits (e.g., semantic label) across multiple views. Existing methods detect these anomalies in an unsupervised manner with the clustering assumption: normal data share consistent clustering structure across views while anomalous data exhibits inconsistent clusters across views. However, these methods would fail for complex multi-view data distributions where there is no obvious clusters. Moreover, existing models suffer from robustness since they are undermined by anomalies during training time. To get rid of the clustering assumption, we propose a Cross-aligned and Gumbel-refactored AutoEncoders (CGAEs) model to effectively detect two types of multi-view anomalies. In CGAEs, we devise a cross-reconstruction module to detect class anomaly by recovering one view from another view. Class anomalies would lead to high cross-reconstruction loss since they do not have the correct information in one view to generate another. We further design a view-alignment module to detect attribute anomaly by the alignment distance among multiple views in the latent space. Attribute anomalies possess large distances since they are less aligned due to fewer anomalous training instances. To handle the robustness issue, we propose a Gumbel-refactored reconstruction loss to replace the mean square error (MSE) in original autoencoders. The cross entropy loss is calculated between the discreterized input and Gumbel-sampled output, thus disregarding the irrelevant details to achieve model robustness. Experimental results validate the superiority of the proposed CGAEs model on both the benchmark datasets and real world datasets

    Marginalized Multiview Ensemble Clustering

    Get PDF
    Multiview clustering (MVC), which aims to explore the underlying cluster structure shared by multiview data, has drawn more research efforts in recent years. To exploit the complementary information among multiple views, existing methods mainly learn a common latent subspace or develop a certain loss across different views, while ignoring the higher level information such as basic partitions (BPs) generated by the single-view clustering algorithm. In light of this, we propose a novel marginalized multiview ensemble clustering (M 2 VEC) method in this paper. Specifically, we solve MVC in an EC way, which generates BPs for each view individually and seeks for a consensus one. By this means, we naturally leverage the complementary information of multiview data upon the same partition space. In order to boost the robustness of our approach, the marginalized denoising process is adopted to mimic the data corruptions and noises, which provides robust partition-level representations for each view by training a single-layer autoencoder. A low-rank and sparse decomposition is seamlessly incorporated into the denoising process to explicitly capture the consistency information and meanwhile compensate the distinctness between heterogeneous features. Spectral consensus graph partitioning is also involved by our model to make M 2 VEC as a unified optimization framework. Moreover, a multilayer M 2 VEC is eventually delivered in a stacked fashion to encapsulate nonlinearity into partition-level representations for handling complex data. Experimental results on eight real-world data sets show the efficacy of our approach compared with several state-of-the-art multiview and EC methods. We also showcase our method performs well with partial multiview data

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks
    corecore