2,592 research outputs found
Contrastive Learning Is Spectral Clustering On Similarity Graph
Contrastive learning is a powerful self-supervised learning method, but we
have a limited theoretical understanding of how it works and why it works. In
this paper, we prove that contrastive learning with the standard InfoNCE loss
is equivalent to spectral clustering on the similarity graph. Using this
equivalence as the building block, we extend our analysis to the CLIP model and
rigorously characterize how similar multi-modal objects are embedded together.
Motivated by our theoretical insights, we introduce the kernel mixture loss,
incorporating novel kernel functions that outperform the standard Gaussian
kernel on several vision datasets.Comment: We express our gratitude to the anonymous reviewers for their
valuable feedbac
Tailored graph ensembles as proxies or null models for real networks II: results on directed graphs
We generate new mathematical tools with which to quantify the macroscopic
topological structure of large directed networks. This is achieved via a
statistical mechanical analysis of constrained maximum entropy ensembles of
directed random graphs with prescribed joint distributions for in- and
outdegrees and prescribed degree-degree correlation functions. We calculate
exact and explicit formulae for the leading orders in the system size of the
Shannon entropies and complexities of these ensembles, and for
information-theoretic distances. The results are applied to data on gene
regulation networks.Comment: 21 pages, 1 figure, submitted to J. Phys.
Local-global nested graph kernels using nested complexity traces
Abstract In this paper, we propose two novel local-global nested graph kernels, namely the nested aligned kernel and the nested reproducing kernel, drawing on depth-based complexity traces. Both of the nested kernels gauge the nested depth complexity trace through a family of K-layer expansion subgraphs rooted at the centroid vertex, i.e., the vertex with minimum shortest path length variance to the remaining vertices. Specifically, for a pair of graphs, we commence by computing the centroid depth-based complexity traces rooted at the centroid vertices. The first nested kernel is defined by measuring the global alignment kernel, which is based on the dynamic time warping framework, between the complexity traces. Since the required global alignment kernel incorporates the whole spectrum of alignment costs between the complexity traces, this nested kernel can provide rich statistic measures. The second nested kernel, on the other hand, is defined by measuring the basic reproducing kernel between the complexity traces. Since the associated reproducing kernel only requires time complexity O(1), this nested kernel has very low computational complexity. We theoretically show that both of the proposed nested kernels can simultaneously reflect the local and global graph characteristics in terms of the nested complexity traces. Experiments on standard graph datasets abstracted from bioinformatics and computer vision databases demonstrate the effectiveness and efficiency of the proposed graph kernels
Recommended from our members
Learning Theory and Approximation
Learning theory studies data structures from samples and aims at understanding unknown function relations behind them. This leads to interesting theoretical problems which can be often attacked with methods from Approximation Theory. This workshop - the second one of this type at the MFO - has concentrated on the following recent topics: Learning of manifolds and the geometry of data; sparsity and dimension reduction; error analysis and algorithmic aspects, including kernel based methods for regression and classification; application of multiscale aspects and of refinement algorithms to learning
Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods
Feature extraction and dimensionality reduction are important tasks in many
fields of science dealing with signal processing and analysis. The relevance of
these techniques is increasing as current sensory devices are developed with
ever higher resolution, and problems involving multimodal data sources become
more common. A plethora of feature extraction methods are available in the
literature collectively grouped under the field of Multivariate Analysis (MVA).
This paper provides a uniform treatment of several methods: Principal Component
Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis
(CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions
derived by means of the theory of reproducing kernel Hilbert spaces. We also
review their connections to other methods for classification and statistical
dependence estimation, and introduce some recent developments to deal with the
extreme cases of large-scale and low-sized problems. To illustrate the wide
applicability of these methods in both classification and regression problems,
we analyze their performance in a benchmark of publicly available data sets,
and pay special attention to specific real applications involving audio
processing for music genre prediction and hyperspectral satellite images for
Earth and climate monitoring
- …