780 research outputs found

    Fast Robust PCA on Graphs

    Get PDF
    Mining useful clusters from high dimensional data has received significant attention of the computer vision and pattern recognition community in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of dimensionality. However, often such methods are accompanied with three different problems: high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix factorization methods) and susceptibility to gross corruptions in the data. In this paper we propose a principal component analysis (PCA) based solution that overcomes these three issues and approximates a low-rank recovery method for high dimensional datasets. We target the low-rank recovery by enforcing two types of graph smoothness assumptions, one on the data samples and the other on the features by designing a convex optimization problem. The resulting algorithm is fast, efficient and scalable for huge datasets with O(nlog(n)) computational complexity in the number of data samples. It is also robust to gross corruptions in the dataset as well as to the model parameters. Clustering experiments on 7 benchmark datasets with different types of corruptions and background separation experiments on 3 video datasets show that our proposed model outperforms 10 state-of-the-art dimensionality reduction models. Our theoretical analysis proves that the proposed model is able to recover approximate low-rank representations with a bounded error for clusterable data

    Deep Clustering: A Comprehensive Survey

    Full text link
    Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering

    Low-Rank Matrices on Graphs: Generalized Recovery & Applications

    Get PDF
    Many real world datasets subsume a linear or non-linear low-rank structure in a very low-dimensional space. Unfortunately, one often has very little or no information about the geometry of the space, resulting in a highly under-determined recovery problem. Under certain circumstances, state-of-the-art algorithms provide an exact recovery for linear low-rank structures but at the expense of highly inscalable algorithms which use nuclear norm. However, the case of non-linear structures remains unresolved. We revisit the problem of low-rank recovery from a totally different perspective, involving graphs which encode pairwise similarity between the data samples and features. Surprisingly, our analysis confirms that it is possible to recover many approximate linear and non-linear low-rank structures with recovery guarantees with a set of highly scalable and efficient algorithms. We call such data matrices as \textit{Low-Rank matrices on graphs} and show that many real world datasets satisfy this assumption approximately due to underlying stationarity. Our detailed theoretical and experimental analysis unveils the power of the simple, yet very novel recovery framework \textit{Fast Robust PCA on Graphs

    Schroedinger Eigenmaps for Manifold Alignment of Multimodal Hyperspectral Images

    Get PDF
    Multimodal remote sensing is an upcoming field as it allows for many views of the same region of interest. Domain adaption attempts to fuse these multimodal remotely sensed images by utilizing the concept of transfer learning to understand data from different sources to learn a fused outcome. Semisupervised Manifold Alignment (SSMA) maps multiple Hyperspectral images (HSIs) from high dimensional source spaces to a low dimensional latent space where similar elements reside closely together. SSMA preserves the original geometric structure of respective HSIs whilst pulling similar data points together and pushing dissimilar data points apart. The SSMA algorithm is comprised of a geometric component, a similarity component and dissimilarity component. The geometric component of the SSMA method has roots in the original Laplacian Eigenmaps (LE) dimension reduction algorithm and the projection functions have roots in the original Locality Preserving Projections (LPP) dimensionality reduction framework. The similarity and dissimilarity component is a semisupervised component that allows expert labeled information to improve the image fusion process. Spatial-Spectral Schroedinger Eigenmaps (SSSE) was designed as a semisupervised enhancement to the LE algorithm by augmenting the Laplacian matrix with a user-defined potential function. However, the user-defined enhancement has yet to be explored in the LPP framework. The first part of this thesis proposes to use the Spatial-Spectral potential within the LPP algorithm, creating a new algorithm we call the Schroedinger Eigenmap Projections (SEP). Through experiments on publicly available data with expert-labeled ground truth, we perform experiments to compare the performance of the SEP algorithm with respect to the LPP algorithm. The second part of this thesis proposes incorporating the Spatial Spectral potential from SSSE into the SSMA framework. Using two multi-angled HSI’s, we explore the impact of incorporating this potential into SSMA

    Disentangled Variational Auto-Encoder for Semi-supervised Learning

    Full text link
    Semi-supervised learning is attracting increasing attention due to the fact that datasets of many domains lack enough labeled data. Variational Auto-Encoder (VAE), in particular, has demonstrated the benefits of semi-supervised learning. The majority of existing semi-supervised VAEs utilize a classifier to exploit label information, where the parameters of the classifier are introduced to the VAE. Given the limited labeled data, learning the parameters for the classifiers may not be an optimal solution for exploiting label information. Therefore, in this paper, we develop a novel approach for semi-supervised VAE without classifier. Specifically, we propose a new model called Semi-supervised Disentangled VAE (SDVAE), which encodes the input data into disentangled representation and non-interpretable representation, then the category information is directly utilized to regularize the disentangled representation via the equality constraint. To further enhance the feature learning ability of the proposed VAE, we incorporate reinforcement learning to relieve the lack of data. The dynamic framework is capable of dealing with both image and text data with its corresponding encoder and decoder networks. Extensive experiments on image and text datasets demonstrate the effectiveness of the proposed framework.Comment: 6 figures, 10 pages, Information Sciences 201

    A Survey on Knowledge Graphs: Representation, Acquisition and Applications

    Full text link
    Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed. We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions

    PCA using graph total variation

    Get PDF
    Mining useful clusters from high dimensional data has received sig- nificant attention of the signal processing and machine learning com- munity in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of di- mensionality. However, often such methods are accompanied with problems such as high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix fac- torization methods) or susceptibility to gross corruptions in the data. In this paper we propose a convex, robust, scalable and efficient Prin- cipal Component Analysis (PCA) based method to approximate the low-rank representation of high dimensional datasets via a two-way graph regularization scheme. Compared to the exact recovery meth- ods, our method is approximate, in that it enforces a piecewise con- stant assumption on the samples using a graph total variation and a piecewise smoothness assumption on the features using a graph Tikhonov regularization. Futhermore, it retrieves the low-rank rep- resentation in a time that is linear in the number of data samples. Clustering experiments on 3 benchmark datasets with different types of corruptions show that our proposed model outperforms 7 state-of- the-art dimensionality reduction models

    Towards Data-centric Graph Machine Learning: Review and Outlook

    Full text link
    Data-centric AI, with its primary focus on the collection, management, and utilization of data to drive AI models and applications, has attracted increasing attention in recent years. In this article, we conduct an in-depth and comprehensive review, offering a forward-looking outlook on the current efforts in data-centric AI pertaining to graph data-the fundamental data structure for representing and capturing intricate dependencies among massive and diverse real-life entities. We introduce a systematic framework, Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of the graph data lifecycle, including graph data collection, exploration, improvement, exploitation, and maintenance. A thorough taxonomy of each stage is presented to answer three critical graph-centric questions: (1) how to enhance graph data availability and quality; (2) how to learn from graph data with limited-availability and low-quality; (3) how to build graph MLOps systems from the graph data-centric view. Lastly, we pinpoint the future prospects of the DC-GML domain, providing insights to navigate its advancements and applications.Comment: 42 pages, 9 figure
    • …
    corecore