54 research outputs found
SEAL: Simultaneous Label Hierarchy Exploration And Learning
Label hierarchy is an important source of external knowledge that can enhance
classification performance. However, most existing methods rely on predefined
label hierarchies that may not match the data distribution. To address this
issue, we propose Simultaneous label hierarchy Exploration And Learning (SEAL),
a new framework that explores the label hierarchy by augmenting the observed
labels with latent labels that follow a prior hierarchical structure. Our
approach uses a 1-Wasserstein metric over the tree metric space as an objective
function, which enables us to simultaneously learn a data-driven label
hierarchy and perform (semi-)supervised learning. We evaluate our method on
several datasets and show that it achieves superior results in both supervised
and semi-supervised scenarios and reveals insightful label structures. Our
implementation is available at https://github.com/tzq1999/SEAL
OTMatch: Improving Semi-Supervised Learning with Optimal Transport
Semi-supervised learning has made remarkable strides by effectively utilizing
a limited amount of labeled data while capitalizing on the abundant information
present in unlabeled data. However, current algorithms often prioritize
aligning image predictions with specific classes generated through
self-training techniques, thereby neglecting the inherent relationships that
exist within these classes. In this paper, we present a new approach called
OTMatch, which leverages semantic relationships among classes by employing an
optimal transport loss function. By utilizing optimal transport, our proposed
method consistently outperforms established state-of-the-art methods. Notably,
we observed a substantial improvement of a certain percentage in accuracy
compared to the current state-of-the-art method, FreeMatch. OTMatch achieves
3.18%, 3.46%, and 1.28% error rate reduction over FreeMatch on CIFAR-10 with 1
label per class, STL-10 with 4 labels per class, and ImageNet with 100 labels
per class, respectively. This demonstrates the effectiveness and superiority of
our approach in harnessing semantic relationships to enhance learning
performance in a semi-supervised setting
Multiplexed Streaming Codes for Messages With Different Decoding Delays in Channel with Burst and Random Erasures
In a real-time transmission scenario, messages are transmitted through a
channel that is subject to packet loss. The destination must recover the
messages within the required deadline. In this paper, we consider a setup where
two different types of messages with distinct decoding deadlines are
transmitted through a channel that can introduce burst erasures of a length at
most , or random erasures. The message with a short decoding deadline
is referred to as an urgent message, while the other one with a decoding
deadline () is referred to as a less urgent message.
We propose a merging method to encode two message streams of different
urgency levels into a single flow. We consider the scenario where . We establish that any coding strategy based on this merging approach has a
closed-form upper limit on its achievable sum rate. Moreover, we present
explicit constructions within a finite field that scales quadratically with the
imposed delay, ensuring adherence to the upper bound. In a given parameter
configuration, we rigorously demonstrate that the sum rate of our proposed
streaming codes consistently surpasses that of separate encoding, which serves
as a baseline for comparison
Privacy-Preserving Polynomial Computing Over Distributed Data
In this letter, we delve into a scenario where a user aims to compute
polynomial functions using their own data as well as data obtained from
distributed sources. To accomplish this, the user enlists the assistance of
distributed workers, thereby defining a problem we refer to as
privacy-preserving polynomial computing over distributed data. To address this
challenge, we propose an approach founded upon Lagrange encoding. Our method
not only possesses the ability to withstand the presence of stragglers and
byzantine workers but also ensures the preservation of security. Specifically,
even if a coalition of workers collude, they are unable to acquire any
knowledge pertaining to the data originating from the distributed sources or
the user
Kernel-SSL: Kernel KL Divergence for Self-Supervised Learning
Contrastive learning usually compares one positive anchor sample with lots of
negative samples to perform Self-Supervised Learning (SSL). Alternatively,
non-contrastive learning, as exemplified by methods like BYOL, SimSiam, and
Barlow Twins, accomplishes SSL without the explicit use of negative samples.
Inspired by the existing analysis for contrastive learning, we provide a
reproducing kernel Hilbert space (RKHS) understanding of many existing
non-contrastive learning methods. Subsequently, we propose a novel loss
function, Kernel-SSL, which directly optimizes the mean embedding and the
covariance operator within the RKHS. In experiments, our method Kernel-SSL
outperforms state-of-the-art methods by a large margin on ImageNet datasets
under the linear evaluation settings. Specifically, when performing 100 epochs
pre-training, our method outperforms SimCLR by 4.6%
RelationMatch: Matching In-batch Relationships for Semi-supervised Learning
Semi-supervised learning has achieved notable success by leveraging very few
labeled data and exploiting the wealth of information derived from unlabeled
data. However, existing algorithms usually focus on aligning predictions on
paired data points augmented from an identical source, and overlook the
inter-point relationships within each batch. This paper introduces a novel
method, RelationMatch, which exploits in-batch relationships with a matrix
cross-entropy (MCE) loss function. Through the application of MCE, our proposed
method consistently surpasses the performance of established state-of-the-art
methods, such as FixMatch and FlexMatch, across a variety of vision datasets.
Notably, we observed a substantial enhancement of 15.21% in accuracy over
FlexMatch on the STL-10 dataset using only 40 labels. Moreover, we apply MCE to
supervised learning scenarios, and observe consistent improvements as well
Contrastive Learning Is Spectral Clustering On Similarity Graph
Contrastive learning is a powerful self-supervised learning method, but we
have a limited theoretical understanding of how it works and why it works. In
this paper, we prove that contrastive learning with the standard InfoNCE loss
is equivalent to spectral clustering on the similarity graph. Using this
equivalence as the building block, we extend our analysis to the CLIP model and
rigorously characterize how similar multi-modal objects are embedded together.
Motivated by our theoretical insights, we introduce the kernel mixture loss,
incorporating novel kernel functions that outperform the standard Gaussian
kernel on several vision datasets.Comment: We express our gratitude to the anonymous reviewers for their
valuable feedbac
Information Flow in Self-Supervised Learning
In this paper, we provide a comprehensive toolbox for understanding and
enhancing self-supervised learning (SSL) methods through the lens of matrix
information theory. Specifically, by leveraging the principles of matrix mutual
information and joint entropy, we offer a unified analysis for both contrastive
and feature decorrelation based methods. Furthermore, we propose the matrix
variational masked auto-encoder (M-MAE) method, grounded in matrix information
theory, as an enhancement to masked image modeling. The empirical evaluations
underscore the effectiveness of M-MAE compared with the state-of-the-art
methods, including a 3.9% improvement in linear probing ViT-Base, and a 1%
improvement in fine-tuning ViT-Large, both on ImageNet
Unveiling the Dynamics of Information Interplay in Supervised Learning
In this paper, we use matrix information theory as an analytical tool to
analyze the dynamics of the information interplay between data representations
and classification head vectors in the supervised learning process.
Specifically, inspired by the theory of Neural Collapse, we introduce matrix
mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to
assess the interactions of data representation and class classification heads
in supervised learning, and we determine the theoretical optimal values for MIR
and HDR when Neural Collapse happens. Our experiments show that MIR and HDR can
effectively explain many phenomena occurring in neural networks, for example,
the standard supervised training dynamics, linear mode connectivity, and the
performance of label smoothing and pruning. Additionally, we use MIR and HDR to
gain insights into the dynamics of grokking, which is an intriguing phenomenon
observed in supervised training, where the model demonstrates generalization
capabilities long after it has learned to fit the training data. Furthermore,
we introduce MIR and HDR as loss terms in supervised and semi-supervised
learning to optimize the information interactions among samples and
classification heads. The empirical results provide evidence of the method's
effectiveness, demonstrating that the utilization of MIR and HDR not only aids
in comprehending the dynamics throughout the training process but can also
enhances the training procedure itself.Comment: Accepted by ICML 202
DJ-1 can inhibit microtubule associated protein 1 B formed aggregates
<p>Abstract</p> <p>Background</p> <p>Abnormal accumulation and aggregation of microtubule associated proteins (MAPs) plays an important role in the pathogenesis of neurodegenerative diseases. Loss-of-function mutation of DJ-1/Park7 can cause early onset of PD. DJ-1, a molecular chaperone, can inhibit α-synuclein aggregation. Currently, little is known whether or not loss of function of DJ-1 contributes to abnormal MAPs aggregation in neurodegenerative disorders such as PD.</p> <p>Results</p> <p>We presented evidence that DJ-1 could bind to microtubule associated protein1b Light Chain (MAP1b-LC). Overexpression of DJ-1 prevented MAP1b-LC aggregation in HEK293t and SH-SY5Y cells while DJ-1 knocking down (KD) enhanced MAP1b-LC aggregation in SH-SY5Y cells. The increase in insoluble MAP1b-LC was also observed in the DJ-1 null mice brain. Moreover, in the DJ-1 KD SH-SY5Y cells, overexpression of MAP1B-LC led to endoplasmic reticulum (ER) stress-induced apoptosis.</p> <p>Conclusion</p> <p>Our results suggest that DJ-1 acts as a molecular chaperone to inhibit MAP1B aggregation thus leading to neuronal apoptosis. Our study provides a novel insight into the mechanisms that underly the pathogenesis of Parkinson's disease (PD).</p
- …