2,592 research outputs found

    Contrastive Learning Is Spectral Clustering On Similarity Graph

    Full text link
    Contrastive learning is a powerful self-supervised learning method, but we have a limited theoretical understanding of how it works and why it works. In this paper, we prove that contrastive learning with the standard InfoNCE loss is equivalent to spectral clustering on the similarity graph. Using this equivalence as the building block, we extend our analysis to the CLIP model and rigorously characterize how similar multi-modal objects are embedded together. Motivated by our theoretical insights, we introduce the kernel mixture loss, incorporating novel kernel functions that outperform the standard Gaussian kernel on several vision datasets.Comment: We express our gratitude to the anonymous reviewers for their valuable feedbac

    Tailored graph ensembles as proxies or null models for real networks II: results on directed graphs

    Full text link
    We generate new mathematical tools with which to quantify the macroscopic topological structure of large directed networks. This is achieved via a statistical mechanical analysis of constrained maximum entropy ensembles of directed random graphs with prescribed joint distributions for in- and outdegrees and prescribed degree-degree correlation functions. We calculate exact and explicit formulae for the leading orders in the system size of the Shannon entropies and complexities of these ensembles, and for information-theoretic distances. The results are applied to data on gene regulation networks.Comment: 21 pages, 1 figure, submitted to J. Phys.

    Local-global nested graph kernels using nested complexity traces

    Get PDF
    Abstract In this paper, we propose two novel local-global nested graph kernels, namely the nested aligned kernel and the nested reproducing kernel, drawing on depth-based complexity traces. Both of the nested kernels gauge the nested depth complexity trace through a family of K-layer expansion subgraphs rooted at the centroid vertex, i.e., the vertex with minimum shortest path length variance to the remaining vertices. Specifically, for a pair of graphs, we commence by computing the centroid depth-based complexity traces rooted at the centroid vertices. The first nested kernel is defined by measuring the global alignment kernel, which is based on the dynamic time warping framework, between the complexity traces. Since the required global alignment kernel incorporates the whole spectrum of alignment costs between the complexity traces, this nested kernel can provide rich statistic measures. The second nested kernel, on the other hand, is defined by measuring the basic reproducing kernel between the complexity traces. Since the associated reproducing kernel only requires time complexity O(1), this nested kernel has very low computational complexity. We theoretically show that both of the proposed nested kernels can simultaneously reflect the local and global graph characteristics in terms of the nested complexity traces. Experiments on standard graph datasets abstracted from bioinformatics and computer vision databases demonstrate the effectiveness and efficiency of the proposed graph kernels

    Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

    Full text link
    Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring
    corecore