Search CORE

114 research outputs found

Spectal Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning

Author: Munkhoeva Marina
Oseledets Ivan
Publication venue
Publication date: 30/10/2023
Field of study

Self-supervised methods received tremendous attention thanks to their seemingly heuristic approach to learning representations that respect the semantics of the data without any apparent supervision in the form of labels. A growing body of literature is already being published in an attempt to build a coherent and theoretically grounded understanding of the workings of a zoo of losses used in modern self-supervised representation learning methods. In this paper, we attempt to provide an understanding from the perspective of a Laplace operator and connect the inductive bias stemming from the augmentation process to a low-rank matrix completion problem. To this end, we leverage the results from low-rank matrix completion to provide theoretical analysis on the convergence of modern SSL methods and a key property that affects their downstream performance.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals

Data Representation for Learning and Information Fusion in Bioinformatics

Author: Rajapakse Vinodh Nalin
Publication venue
Publication date: 01/01/2013
Field of study

This thesis deals with the rigorous application of nonlinear dimension reduction and data organization techniques to biomedical data analysis. The Laplacian Eigenmaps algorithm is representative of these methods and has been widely applied in manifold learning and related areas. While their asymptotic manifold recovery behavior has been well-characterized, the clustering properties of Laplacian embeddings with finite data are largely motivated by heuristic arguments. We develop a precise bound, characterizing cluster structure preservation under Laplacian embeddings. From this foundation, we introduce flexible and mathematically well-founded approaches for information fusion and feature representation. These methods are applied to three substantial case studies in bioinformatics, illustrating their capacity to extract scientifically valuable information from complex data

Digital Repository at the University of Maryland

Robust Principal Component Analysis?

Author: Candes Emmanuel J.
Li Xiaodong
Ma Yi
Wright John
Publication venue
Publication date: 01/01/2009
Field of study

This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces

arXiv.org e-Print Archive

CiteSeerX

Large-scale Machine Learning in High-dimensional Datasets

Author: Hansen Toke Jansen
Publication venue: Technical University of Denmark
Publication date: 01/01/2013
Field of study

Online Research Database In Technology

Sparse Modeling for Image and Vision Processing

Author: Ecole Normale Supérieure
Francis Bach
Francis Bach
Hal Id Hal
Jean Ponce
Jean Ponce
Julien Mairal
Julien Mairal
Sparse Modeling Image
Vision Processing
Publication venue
Publication date: 01/01/2014
Field of study

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Nonlinear Dimensionality Reduction Methods in Climate Data Analysis

Author: Ross Ian
Publication venue
Publication date: 01/01/2008
Field of study

Linear dimensionality reduction techniques, notably principal component analysis, are widely used in climate data analysis as a means to aid in the interpretation of datasets of high dimensionality. These linear methods may not be appropriate for the analysis of data arising from nonlinear processes occurring in the climate system. Numerous techniques for nonlinear dimensionality reduction have been developed recently that may provide a potentially useful tool for the identification of low-dimensional manifolds in climate data sets arising from nonlinear dynamics. In this thesis I apply three such techniques to the study of El Nino/Southern Oscillation variability in tropical Pacific sea surface temperatures and thermocline depth, comparing observational data with simulations from coupled atmosphere-ocean general circulation models from the CMIP3 multi-model ensemble. The three methods used here are a nonlinear principal component analysis (NLPCA) approach based on neural networks, the Isomap isometric mapping algorithm, and Hessian locally linear embedding. I use these three methods to examine El Nino variability in the different data sets and assess the suitability of these nonlinear dimensionality reduction approaches for climate data analysis. I conclude that although, for the application presented here, analysis using NLPCA, Isomap and Hessian locally linear embedding does not provide additional information beyond that already provided by principal component analysis, these methods are effective tools for exploratory data analysis.Comment: 273 pages, 76 figures; University of Bristol Ph.D. thesis; version with high-resolution figures available from http://www.skybluetrades.net/thesis/ian-ross-thesis.pdf (52Mb download

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

OpenGrey Repository