391 research outputs found
Data-Efficient Learning via Minimizing Hyperspherical Energy
Deep learning on large-scale data is dominant nowadays. The unprecedented
scale of data has been arguably one of the most important driving forces for
the success of deep learning. However, there still exist scenarios where
collecting data or labels could be extremely expensive, e.g., medical imaging
and robotics. To fill up this gap, this paper considers the problem of
data-efficient learning from scratch using a small amount of representative
data. First, we characterize this problem by active learning on homeomorphic
tubes of spherical manifolds. This naturally generates feasible hypothesis
class. With homologous topological properties, we identify an important
connection -- finding tube manifolds is equivalent to minimizing hyperspherical
energy (MHE) in physical geometry. Inspired by this connection, we propose a
MHE-based active learning (MHEAL) algorithm, and provide comprehensive
theoretical guarantees for MHEAL, covering convergence and generalization
analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide
range of applications on data-efficient learning, including deep clustering,
distribution matching, version space sampling and deep active learning
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Large text-to-image diffusion models have impressive capabilities in
generating photorealistic images from text prompts. How to effectively guide or
control these powerful models to perform different downstream tasks becomes an
important open problem. To tackle this challenge, we introduce a principled
finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image
diffusion models to downstream tasks. Unlike existing methods, OFT can provably
preserve hyperspherical energy which characterizes the pairwise neuron
relationship on the unit hypersphere. We find that this property is crucial for
preserving the semantic generation ability of text-to-image diffusion models.
To improve finetuning stability, we further propose Constrained Orthogonal
Finetuning (COFT) which imposes an additional radius constraint to the
hypersphere. Specifically, we consider two important finetuning text-to-image
tasks: subject-driven generation where the goal is to generate subject-specific
images given a few images of a subject and a text prompt, and controllable
generation where the goal is to enable the model to take in additional control
signals. We empirically show that our OFT framework outperforms existing
methods in generation quality and convergence speed.Comment: NeurIPS 2023 (43 pages, 34 figures, project page:
https://oft.wyliu.com/
Improving Singing Voice Separation with the Wave-U-Net Using Minimum Hyperspherical Energy
In recent years, deep learning has surpassed traditional approaches to the problem of singing voice separation. The Wave-U-Net is a recent deep network architecture that operates directly on the time domain. The standard Wave-U- Net is trained with data augmentation and early stopping to prevent overfitting. Minimum hyperspherical energy (MHE) regularization has recently proven to increase generalization in image classification problems by encouraging a diversified filter configuration. In this work, we apply MHE regularization to the 1D filters of the Wave-U-Net. We evaluated this approach for separating the vocal part from mixed music audio recordings on the MUSDB18 dataset. We found that adding MHE regularization to the loss function consistently improves singing voice separation, as measured in the Signal to Distortion Ratio on test recordings, leading to the current best time-domain system for singing voice extraction
Hyperspherically Regularized Networks for Self-Supervision
This work used the Cirrus UK National Tier-2 HPC Service at EPCC (http://www.cirrus.ac.uk). Access granted through the project: ec173 - Next gen self-supervised learning systems for vision tasks.Preprin
- …