54 research outputs found

    SEAL: Simultaneous Label Hierarchy Exploration And Learning

    Full text link
    Label hierarchy is an important source of external knowledge that can enhance classification performance. However, most existing methods rely on predefined label hierarchies that may not match the data distribution. To address this issue, we propose Simultaneous label hierarchy Exploration And Learning (SEAL), a new framework that explores the label hierarchy by augmenting the observed labels with latent labels that follow a prior hierarchical structure. Our approach uses a 1-Wasserstein metric over the tree metric space as an objective function, which enables us to simultaneously learn a data-driven label hierarchy and perform (semi-)supervised learning. We evaluate our method on several datasets and show that it achieves superior results in both supervised and semi-supervised scenarios and reveals insightful label structures. Our implementation is available at https://github.com/tzq1999/SEAL

    OTMatch: Improving Semi-Supervised Learning with Optimal Transport

    Full text link
    Semi-supervised learning has made remarkable strides by effectively utilizing a limited amount of labeled data while capitalizing on the abundant information present in unlabeled data. However, current algorithms often prioritize aligning image predictions with specific classes generated through self-training techniques, thereby neglecting the inherent relationships that exist within these classes. In this paper, we present a new approach called OTMatch, which leverages semantic relationships among classes by employing an optimal transport loss function. By utilizing optimal transport, our proposed method consistently outperforms established state-of-the-art methods. Notably, we observed a substantial improvement of a certain percentage in accuracy compared to the current state-of-the-art method, FreeMatch. OTMatch achieves 3.18%, 3.46%, and 1.28% error rate reduction over FreeMatch on CIFAR-10 with 1 label per class, STL-10 with 4 labels per class, and ImageNet with 100 labels per class, respectively. This demonstrates the effectiveness and superiority of our approach in harnessing semantic relationships to enhance learning performance in a semi-supervised setting

    Multiplexed Streaming Codes for Messages With Different Decoding Delays in Channel with Burst and Random Erasures

    Full text link
    In a real-time transmission scenario, messages are transmitted through a channel that is subject to packet loss. The destination must recover the messages within the required deadline. In this paper, we consider a setup where two different types of messages with distinct decoding deadlines are transmitted through a channel that can introduce burst erasures of a length at most BB, or NN random erasures. The message with a short decoding deadline TuT_u is referred to as an urgent message, while the other one with a decoding deadline TvT_v (Tv>TuT_v > T_u) is referred to as a less urgent message. We propose a merging method to encode two message streams of different urgency levels into a single flow. We consider the scenario where Tv>Tu+BT_v > T_u + B. We establish that any coding strategy based on this merging approach has a closed-form upper limit on its achievable sum rate. Moreover, we present explicit constructions within a finite field that scales quadratically with the imposed delay, ensuring adherence to the upper bound. In a given parameter configuration, we rigorously demonstrate that the sum rate of our proposed streaming codes consistently surpasses that of separate encoding, which serves as a baseline for comparison

    Privacy-Preserving Polynomial Computing Over Distributed Data

    Full text link
    In this letter, we delve into a scenario where a user aims to compute polynomial functions using their own data as well as data obtained from distributed sources. To accomplish this, the user enlists the assistance of NN distributed workers, thereby defining a problem we refer to as privacy-preserving polynomial computing over distributed data. To address this challenge, we propose an approach founded upon Lagrange encoding. Our method not only possesses the ability to withstand the presence of stragglers and byzantine workers but also ensures the preservation of security. Specifically, even if a coalition of XX workers collude, they are unable to acquire any knowledge pertaining to the data originating from the distributed sources or the user

    Kernel-SSL: Kernel KL Divergence for Self-Supervised Learning

    Full text link
    Contrastive learning usually compares one positive anchor sample with lots of negative samples to perform Self-Supervised Learning (SSL). Alternatively, non-contrastive learning, as exemplified by methods like BYOL, SimSiam, and Barlow Twins, accomplishes SSL without the explicit use of negative samples. Inspired by the existing analysis for contrastive learning, we provide a reproducing kernel Hilbert space (RKHS) understanding of many existing non-contrastive learning methods. Subsequently, we propose a novel loss function, Kernel-SSL, which directly optimizes the mean embedding and the covariance operator within the RKHS. In experiments, our method Kernel-SSL outperforms state-of-the-art methods by a large margin on ImageNet datasets under the linear evaluation settings. Specifically, when performing 100 epochs pre-training, our method outperforms SimCLR by 4.6%

    RelationMatch: Matching In-batch Relationships for Semi-supervised Learning

    Full text link
    Semi-supervised learning has achieved notable success by leveraging very few labeled data and exploiting the wealth of information derived from unlabeled data. However, existing algorithms usually focus on aligning predictions on paired data points augmented from an identical source, and overlook the inter-point relationships within each batch. This paper introduces a novel method, RelationMatch, which exploits in-batch relationships with a matrix cross-entropy (MCE) loss function. Through the application of MCE, our proposed method consistently surpasses the performance of established state-of-the-art methods, such as FixMatch and FlexMatch, across a variety of vision datasets. Notably, we observed a substantial enhancement of 15.21% in accuracy over FlexMatch on the STL-10 dataset using only 40 labels. Moreover, we apply MCE to supervised learning scenarios, and observe consistent improvements as well

    Contrastive Learning Is Spectral Clustering On Similarity Graph

    Full text link
    Contrastive learning is a powerful self-supervised learning method, but we have a limited theoretical understanding of how it works and why it works. In this paper, we prove that contrastive learning with the standard InfoNCE loss is equivalent to spectral clustering on the similarity graph. Using this equivalence as the building block, we extend our analysis to the CLIP model and rigorously characterize how similar multi-modal objects are embedded together. Motivated by our theoretical insights, we introduce the kernel mixture loss, incorporating novel kernel functions that outperform the standard Gaussian kernel on several vision datasets.Comment: We express our gratitude to the anonymous reviewers for their valuable feedbac

    Information Flow in Self-Supervised Learning

    Full text link
    In this paper, we provide a comprehensive toolbox for understanding and enhancing self-supervised learning (SSL) methods through the lens of matrix information theory. Specifically, by leveraging the principles of matrix mutual information and joint entropy, we offer a unified analysis for both contrastive and feature decorrelation based methods. Furthermore, we propose the matrix variational masked auto-encoder (M-MAE) method, grounded in matrix information theory, as an enhancement to masked image modeling. The empirical evaluations underscore the effectiveness of M-MAE compared with the state-of-the-art methods, including a 3.9% improvement in linear probing ViT-Base, and a 1% improvement in fine-tuning ViT-Large, both on ImageNet

    Unveiling the Dynamics of Information Interplay in Supervised Learning

    Full text link
    In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Specifically, inspired by the theory of Neural Collapse, we introduce matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to assess the interactions of data representation and class classification heads in supervised learning, and we determine the theoretical optimal values for MIR and HDR when Neural Collapse happens. Our experiments show that MIR and HDR can effectively explain many phenomena occurring in neural networks, for example, the standard supervised training dynamics, linear mode connectivity, and the performance of label smoothing and pruning. Additionally, we use MIR and HDR to gain insights into the dynamics of grokking, which is an intriguing phenomenon observed in supervised training, where the model demonstrates generalization capabilities long after it has learned to fit the training data. Furthermore, we introduce MIR and HDR as loss terms in supervised and semi-supervised learning to optimize the information interactions among samples and classification heads. The empirical results provide evidence of the method's effectiveness, demonstrating that the utilization of MIR and HDR not only aids in comprehending the dynamics throughout the training process but can also enhances the training procedure itself.Comment: Accepted by ICML 202

    DJ-1 can inhibit microtubule associated protein 1 B formed aggregates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Abnormal accumulation and aggregation of microtubule associated proteins (MAPs) plays an important role in the pathogenesis of neurodegenerative diseases. Loss-of-function mutation of DJ-1/Park7 can cause early onset of PD. DJ-1, a molecular chaperone, can inhibit α-synuclein aggregation. Currently, little is known whether or not loss of function of DJ-1 contributes to abnormal MAPs aggregation in neurodegenerative disorders such as PD.</p> <p>Results</p> <p>We presented evidence that DJ-1 could bind to microtubule associated protein1b Light Chain (MAP1b-LC). Overexpression of DJ-1 prevented MAP1b-LC aggregation in HEK293t and SH-SY5Y cells while DJ-1 knocking down (KD) enhanced MAP1b-LC aggregation in SH-SY5Y cells. The increase in insoluble MAP1b-LC was also observed in the DJ-1 null mice brain. Moreover, in the DJ-1 KD SH-SY5Y cells, overexpression of MAP1B-LC led to endoplasmic reticulum (ER) stress-induced apoptosis.</p> <p>Conclusion</p> <p>Our results suggest that DJ-1 acts as a molecular chaperone to inhibit MAP1B aggregation thus leading to neuronal apoptosis. Our study provides a novel insight into the mechanisms that underly the pathogenesis of Parkinson's disease (PD).</p
    corecore