22 research outputs found

    Robustness meets low-rankness: unified entropy and tensor learning for multi-view subspace clustering

    Get PDF
    In this paper, we develop the weighted error entropy-regularized tensor learning method for multi-view subspace clustering (WETMSC), which integrates the noise disturbance removal and subspace structure discovery into one unified framework. Unlike most existing methods which focus only on the affinity matrix learning for the subspace discovery by different optimization models and simply assume that the noise is independent and identically distributed (i.i.d.), our WETMSC method adopts the weighted error entropy to characterize the underlying noise by assuming that noise is independent and piecewise identically distributed (i.p.i.d.). Meanwhile, WETMSC constructs the self-representation tensor by storing all self-representation matrices from the view dimension, preserving high-order correlation of views based on the tensor nuclear norm. To solve the proposed nonconvex optimization method, we design a half-quadratic (HQ) additive optimization technology and iteratively solve all subproblems under the alternating direction method of multipliers framework. Extensive comparison studies with state-of-the-art clustering methods on real-world datasets and synthetic noisy datasets demonstrate the ascendancy of the proposed WETMSC method

    New Approaches in Multi-View Clustering

    Get PDF
    Many real-world datasets can be naturally described by multiple views. Due to this, multi-view learning has drawn much attention from both academia and industry. Compared to single-view learning, multi-view learning has demonstrated plenty of advantages. Clustering has long been serving as a critical technique in data mining and machine learning. Recently, multi-view clustering has achieved great success in various applications. To provide a comprehensive review of the typical multi-view clustering methods and their corresponding recent developments, this chapter summarizes five kinds of popular clustering methods and their multi-view learning versions, which include k-means, spectral clustering, matrix factorization, tensor decomposition, and deep learning. These clustering methods are the most widely employed algorithms for single-view data, and lots of efforts have been devoted to extending them for multi-view clustering. Besides, many other multi-view clustering methods can be unified into the frameworks of these five methods. To promote further research and development of multi-view clustering, some popular and open datasets are summarized in two categories. Furthermore, several open issues that deserve more exploration are pointed out in the end

    Contribution to Graph-based Multi-view Clustering: Algorithms and Applications

    Get PDF
    185 p.In this thesis, we study unsupervised learning, specifically, clustering methods for dividing data into meaningful groups. One major challenge is how to find an efficient algorithm with low computational complexity to deal with different types and sizes of datasets.For this purpose, we propose two approaches. The first approach is named "Multi-view Clustering via Kernelized Graph and Nonnegative Embedding" (MKGNE), and the second approach is called "Multi-view Clustering via Consensus Graph Learning and Nonnegative Embedding" (MVCGE). These two approaches jointly solve four tasks. They jointly estimate the unified similarity matrix over all views using the kernel tricks, the unified spectral projection of the data, the clusterindicator matrix, and the weight of each view without additional parameters. With these two approaches, there is no need for any postprocessing such as k-means clustering.In a further study, we propose a method named "Multi-view Spectral Clustering via Constrained Nonnegative Embedding" (CNESE). This method can overcome the drawbacks of the spectral clustering approaches, since they only provide a nonlinear projection of the data, on which an additional step of clustering is required. This can degrade the quality of the final clustering due to various factors such as the initialization process or outliers. Overcoming these drawbacks can be done by introducing a nonnegative embedding matrix which gives the final clustering assignment. In addition, some constraints are added to the targeted matrix to enhance the clustering performance.In accordance with the above methods, a new method called "Multi-view Spectral Clustering with a self-taught Robust Graph Learning" (MCSRGL) has been developed. Different from other approaches, this method integrates two main paradigms into the one-step multi-view clustering model. First, we construct an additional graph by using the cluster label space in addition to the graphs associated with the data space. Second, a smoothness constraint is exploited to constrain the cluster-label matrix and make it more consistent with the data views and the label view.Moreover, we propose two unified frameworks for multi-view clustering in Chapter 9. In these frameworks, we attempt to determine a view-based graphs, the consensus graph, the consensus spectral representation, and the soft clustering assignments. These methods retain the main advantages of the aforementioned methods and integrate the concepts of consensus and unified matrices. By using the unified matrices, we enforce the matrices of different views to be similar, and thus the problem of noise and inconsistency between different views will be reduced.Extensive experiments were conducted on several public datasets with different types and sizes, varying from face image datasets, to document datasets, handwritten datasets, and synthetics datasets. We provide several analyses of the proposed algorithms, including ablation studies, hyper-parameter sensitivity analyses, and computational costs. The experimental results show that the developed algorithms through this thesis are relevant and outperform several competing methods

    Advances in nonnegative matrix factorization with application on data clustering.

    Get PDF
    Clustering is an important direction in many ļ¬elds, e.g., machine learning, data mining and computer vision. It aims to divide data into groups (clusters) for the purposes of summarization or improved understanding. With the rapid development of new technology, high-dimensional data become very common in many real world applications, such as satellite returned large number of images, robot received real-time video streaming, large-scale text database and the mass of information on the social networks (i.e., Facebook, twitter), etc, however, most existing clustering approaches are heavily restricted by the large number of features, and tend to be ineļ¬ƒcient and even infeasible. In this thesis, we focus on ļ¬nding an optimal low dimensional representation of high-dimensional data, based nonnegative matrix factorization (NMF) framework, for better clustering. Speciļ¬cally, there are three methods as follows: - Multiple Components Based Representation Learning Real data are usually complex and contain various components. For example, face images have expressions and genders. Each component mainly reļ¬‚ects one aspect of data and provides information others do not have. Therefore, exploring the semantic information of multiple components as well as the diversity among them is of great beneļ¬t to understand data comprehensively and in-depth. To this end, we propose a novel multi-component nonnegative matrix factorization. Instead of seeking for only one representation of data, our approach learns multiple representations simultaneously, with the help of the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term. HSIC explores the diverse information among the representations, where each representation corresponds to a component. By integrating the multiple representations, a more comprehensive representation is then established. Extensive experimental results on real-world datasets have shown that MCNMF not only achieves more accurate performance over the state-of-the-arts using the aggregated representation, but also interprets data from diļ¬€erent aspects with the multiple representations, which is beyond what current NMFs can oļ¬€er. - Ordered Structure Preserving Representation Learning Real-world applications often process data, such as motion sequences and video clips, are with ordered structure, i.e., consecutive neighbouring data samples are very likely share similar features unless a sudden change occurs. Therefore, traditional NMF assumes the data samples and features to be independently distributed, making it not proper for the analysis of such data. To overcome this limitation, a novel NMF approach is proposed to take full advantage of the ordered nature embedded in the sequential data to improve the accuracy of data representation. With a L2,1-norm based neighbour penalty term, ORNMF enforces the similarity of neighbouring data. ORNMF also adopts the L2,1-norm based loss function to improve its robustness against noises and outliers. Moreover, ORNMF can ļ¬nd the cluster boundaries and get the number of clusters without the number of clusters to be given beforehand. A new iterative up- dating optimization algorithm is derived to solve ORNMFā€™s objective function. The proofs of the convergence and correctness of the scheme are also presented. Experiments on both synthetic and real-world datasets have demonstrated the eļ¬€ectiveness of ORNMF. - Diversity Enhanced Multi-view Representation Learning Multi-view learning aims to explore the correlations of diļ¬€erent information, such as diļ¬€erent features or modalities to boost the performance of data analysis. Multi-view data are very common in many real world applications because data is often collected from diverse domains or obtained from diļ¬€erent feature extractors. For example, color and texture information can be utilized as diļ¬€erent kinds of features in images and videos. Web pages are also able to be represented using the multi-view features based on text and hyperlinks. Taken alone, these views will often be deļ¬cient or incomplete because diļ¬€erent views describe distinct perspectives of data. Therefore, we propose a Diverse Multi-view NMF approach to explore diverse information among multi-view representations for more comprehensive learning. With a novel diversity regularization term, DiNMF explicitly enforces the orthogonality of diļ¬€erent data representations. Importantly, DiNMF converges linearly and scales well with large-scale data. By taking into account the manifold structures, we further extend the approach under a graph-based model to preserve the locally geometrical structure of the manifolds for multi-view setting. Compared to other multi-view NMF methods, the enhanced diversity of both approaches reduce the redundancy between the multi-view representations, and improve the accuracy of the clustering results. - Constrained Multi-View Representation Learning To incorporate prior information for learning accurately, we propose a novel semi- supervised multi-view NMF approach, which considers both the label constraints as well as the multi-view consistence simultaneously. In particular, the approach guarantees that data sharing the same label will have the same new representation and be mapped into the same class in the low-dimensional space regardless whether they come from the same view. Moreover, diļ¬€erent from current NMF- based multi-view clustering methods that require the weight factor of each view to be speciļ¬ed individually, we introduce a single parameter to control the distribution of weighting factors for NMF-based multi-view clustering. Consequently, the weight factor of each view can be assigned automatically depending on the dissimilarity between each new representation matrix and the consensus matrix. Besides, Using the structured sparsity-inducing, L2,1-norm, our method is robust against noises and hence can achieve more stable clustering results

    Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand

    Get PDF
    K-means clustering algorithm is designed to divide the samples into subsets with the goal that maximizes the intra-subset similarity and inter-subset dissimilarity where the similarity measures the relationship between two samples. As an unsupervised learning technique, K-means clustering algorithm is considered one of the most used clustering algorithms and has been applied in a variety of areas such as artificial intelligence, data mining, biology, psychology, marketing, medicine, etc. K-means clustering algorithm is not robust and its clustering result depends on the initialization, the similarity measure, and the predefined cluster number. Previous research focused on solving a part of these issues but has not focused on solving them in a unified framework. However, fixing one of these issues does not guarantee the best performance. To improve K-means clustering algorithm, one of the most famous and widely used clustering algorithms, by solving its issues simultaneously is challenging and significant. This thesis conducts an extensive research on K-means clustering algorithm aiming to improve it. First, we propose the Initialization-Similarity (IS) clustering algorithm to solve the issues of the initialization and the similarity measure of K-means clustering algorithm in a unified way. Specifically, we propose to fix the initialization of the clustering by using sum-of-norms (SON) which outputs the new representation of the original samples and to learn the similarity matrix based on the data distribution. Furthermore, the derived new representation is used to conduct K-means clustering. Second, we propose a Joint Feature Selection with Dynamic Spectral (FSDS) clustering algorithm to solve the issues of the cluster number determination, the similarity measure, and the robustness of the clustering by selecting effective features and reducing the influence of outliers simultaneously. Specifically, we propose to learn the similarity matrix based on the data distribution as well as adding the ranked constraint on the Laplacian matrix of the learned similarity matrix to automatically output the cluster number. Furthermore, the proposed algorithm employs the L2,1-norm as the sparse constraints on the regularization term and the loss function to remove the redundant features and reduce the influence of outliers respectively. Third, we propose a Joint Robust Multi-view (JRM) spectral clustering algorithm that conducts clustering for multi-view data while solving the initialization issue, the cluster number determination, the similarity measure learning, the removal of the redundant features, and the reduction of outlier influence in a unified way. Finally, the proposed algorithms outperformed the state-of-the-art clustering algorithms on real data sets. Moreover, we theoretically prove the convergences of the proposed optimization methods for the proposed objective functions

    Information Theory and Machine Learning

    Get PDF
    The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems

    Metric and Representation Learning

    Full text link
    All data has some inherent mathematical structure. I am interested in understanding the intrinsic geometric and probabilistic structure of data to design effective algorithms and tools that can be applied to machine learning and across all branches of science. The focus of this thesis is to increase the effectiveness of machine learning techniques by developing a mathematical and algorithmic framework using which, given any type of data, we can learn an optimal representation. Representation learning is done for many reasons. It could be done to fix the corruption given corrupted data or to learn a low dimensional or simpler representation, given high dimensional data or a very complex representation of the data. It could also be that the current representation of the data does not capture the important geometric features of the data. One of the many challenges in representation learning is determining ways to judge the quality of the representation learned. In many cases, the consensus is that if d is the natural metric on the representation, then this metric should provide meaningful information about the data. Many examples of this can be seen in areas such as metric learning, manifold learning, and graph embedding. However, most algorithms that solve these problems learn a representation in a metric space first and then extract a metric. A large part of my research is exploring what happens if the order is switched, that is, learn the appropriate metric first and the embedding later. The philosophy behind this approach is that understanding the inherent geometry of the data is the most crucial part of representation learning. Often, studying the properties of the appropriate metric on the input data sets indicates the type of space, we should be seeking for the representation. Hence giving us more robust representations. Optimizing for the appropriate metric can also help overcome issues such as missing and noisy data. My projects fall into three different areas of representation learning. 1) Geometric and probabilistic analysis of representation learning methods. 2) Developing methods to learn optimal metrics on large datasets. 3) Applications. For the category of geometric and probabilistic analysis of representation learning methods, we have three projects. First, designing optimal training data for denoising autoencoders. Second, formulating a new optimal transport problem and understanding the geometric structure. Third, analyzing the robustness to perturbations of the solutions obtained from the classical multidimensional scaling algorithm versus that of the true solutions to the multidimensional scaling problem. For learning optimal metric, we are given a dissimilarity matrix hatDhat{D}, some function ff and some a subset SS of the space of all metrics and we want to find DinSD in S that minimizes f(D,hatD)f(D,hat{D}). In this thesis, we consider the version of the problem when SS is the space of metrics defined on a fixed graph. That is, given a graph GG, we let SS, be the space of all metrics defined via GG. For this SS, we consider the sparse objective function as well as convex objective functions. We also looked at the problem where we want to learn a tree. We also show how the ideas behind learning the optimal metric can be applied to dimensionality reduction in the presence of missing data. Finally, we look at an application to real world data. Specifically trying to reconstruct ancient Greek text.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169738/1/rsonthal_1.pd
    corecore