79 research outputs found
Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning
The goal of data-free meta-learning is to learn useful prior knowledge from a
collection of pre-trained models without accessing their training data.
However, existing works only solve the problem in parameter space, which (i)
ignore the fruitful data knowledge contained in the pre-trained models; (ii)
can not scale to large-scale pre-trained models; (iii) can only meta-learn
pre-trained models with the same network architecture. To address those issues,
we propose a unified framework, dubbed PURER, which contains: (1) ePisode
cUrriculum inveRsion (ECI) during data-free meta training; and (2) invErsion
calibRation following inner loop (ICFIL) during meta testing. During meta
training, we propose ECI to perform pseudo episode training for learning to
adapt fast to new unseen tasks. Specifically, we progressively synthesize a
sequence of pseudo episodes by distilling the training data from each
pre-trained model. The ECI adaptively increases the difficulty level of pseudo
episodes according to the real-time feedback of the meta model. We formulate
the optimization process of meta training with ECI as an adversarial form in an
end-to-end manner. During meta testing, we further propose a simple
plug-and-play supplement-ICFIL-only used during meta testing to narrow the gap
between meta training and meta testing task distribution. Extensive experiments
in various real-world scenarios show the superior performance of ours
Symmetric Pruning in Quantum Neural Networks
Many fundamental properties of a quantum system are captured by its
Hamiltonian and ground state. Despite the significance of ground states
preparation (GSP), this task is classically intractable for large-scale
Hamiltonians. Quantum neural networks (QNNs), which exert the power of modern
quantum machines, have emerged as a leading protocol to conquer this issue. As
such, how to enhance the performance of QNNs becomes a crucial topic in GSP.
Empirical evidence showed that QNNs with handcraft symmetric ansatzes generally
experience better trainability than those with asymmetric ansatzes, while
theoretical explanations have not been explored. To fill this knowledge gap,
here we propose the effective quantum neural tangent kernel (EQNTK) and connect
this concept with over-parameterization theory to quantify the convergence of
QNNs towards the global optima. We uncover that the advance of symmetric
ansatzes attributes to their large EQNTK value with low effective dimension,
which requests few parameters and quantum circuit depth to reach the
over-parameterization regime permitting a benign loss landscape and fast
convergence. Guided by EQNTK, we further devise a symmetric pruning (SP) scheme
to automatically tailor a symmetric ansatz from an over-parameterized and
asymmetric one to greatly improve the performance of QNNs when the explicit
symmetry information of Hamiltonian is unavailable. Extensive numerical
simulations are conducted to validate the analytical results of EQNTK and the
effectiveness of SP.Comment: Accepted to International Conference on Learning Representations
(ICLR) 202
Assessing the wind energy potential of China in considering its variability/intermittency
While wind energy experienced massive deployment in the last decades, the intermittency of wind energy hindered its usage and hence leads to curtailment. It is imperative to quantify and mitigate the intermittency/variability of wind energy for research community as well as industry, but there are no consensus methods yet. The present study took the first attempt to quantify the cost of the variability/intermittency of wind energy with battery energy storage system, aiming at comprehensively assessing the spatial distribution of the exploitability of wind energy in China. The research found that the most abundant wind resources are located in Tibet Plateau, Hexi Corridor, Inner Mongolia in considering the abundance of wind resources, land use type, and landforms, as well as the variability of wind energy. In the near future, wind farms with the advanced energy storage technology in 2030 or 2050 could provide stable wind energy with marketing comparable prices, which is lower than the price of current coal-fired electricity (about 0.5 CNY/kWh). It is worth to note that the variability of wind energy in Qinghai Tibet Plateau could lead to high demanding of storage capacity and therefore unaffordable cost. The proposed methodology can be applied in different regions worldwide. The results of this study could also be a scientific foundation for policy makers for wind power development in China mainland
Continual Learning From a Stream of APIs
Continual learning (CL) aims to learn new tasks without forgetting previous
tasks. However, existing CL methods require a large amount of raw data, which
is often unavailable due to copyright considerations and privacy risks.
Instead, stakeholders usually release pre-trained machine learning models as a
service (MLaaS), which users can access via APIs. This paper considers two
practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL
(DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw
data. Performing CL under these two new settings faces several challenges:
unavailable full raw data, unknown model parameters, heterogeneous models of
arbitrary architecture and scale, and catastrophic forgetting of previous APIs.
To overcome these issues, we propose a novel data-free cooperative continual
distillation learning framework that distills knowledge from a stream of APIs
into a CL model by generating pseudo data, just by querying APIs. Specifically,
our framework includes two cooperative generators and one CL model, forming
their training as an adversarial game. We first use the CL model and the
current API as fixed discriminators to train generators via a derivative-free
method. Generators adversarially generate hard and diverse synthetic data to
maximize the response gap between the CL model and the API. Next, we train the
CL model by minimizing the gap between the responses of the CL model and the
black-box API on synthetic data, to transfer the API's knowledge to the CL
model. Furthermore, we propose a new regularization term based on network
similarity to prevent catastrophic forgetting of previous APIs.Our method
performs comparably to classic CL with full raw data on the MNIST and SVHN in
the DFCL-APIs setting. In the DECL-APIs setting, our method achieves 0.97x,
0.75x and 0.69x performance of classic CL on CIFAR10, CIFAR100, and
MiniImageNet
BadLabel: A Robust Perspective on Evaluating and Enhancing Label-noise Learning
Label-noise learning (LNL) aims to increase the model's generalization given
training data with noisy labels. To facilitate practical LNL algorithms,
researchers have proposed different label noise types, ranging from
class-conditional to instance-dependent noises. In this paper, we introduce a
novel label noise type called BadLabel, which can significantly degrade the
performance of existing LNL algorithms by a large margin. BadLabel is crafted
based on the label-flipping attack against standard classification, where
specific samples are selected and their labels are flipped to other labels so
that the loss values of clean and noisy labels become indistinguishable. To
address the challenge posed by BadLabel, we further propose a robust LNL method
that perturbs the labels in an adversarial manner at each epoch to make the
loss values of clean and noisy labels again distinguishable. Once we select a
small set of (mostly) clean labeled data, we can apply the techniques of
semi-supervised learning to train the model accurately. Empirically, our
experimental results demonstrate that existing LNL algorithms are vulnerable to
the newly introduced BadLabel noise type, while our proposed robust LNL method
can effectively improve the generalization performance of the model under
various types of label noise. The new dataset of noisy labels and the source
codes of robust LNL algorithms are available at
https://github.com/zjfheart/BadLabels
Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels
In recent years, research on learning with noisy labels has focused on
devising novel algorithms that can achieve robustness to noisy training labels
while generalizing to clean data. These algorithms often incorporate
sophisticated techniques, such as noise modeling, label correction, and
co-training. In this study, we demonstrate that a simple baseline using
cross-entropy loss, combined with widely used regularization strategies like
learning rate decay, model weights average, and data augmentations, can
outperform state-of-the-art methods. Our findings suggest that employing a
combination of regularization strategies can be more effective than intricate
algorithms in tackling the challenges of learning with noisy labels. While some
of these regularization strategies have been utilized in previous noisy label
learning research, their full potential has not been thoroughly explored. Our
results encourage a reevaluation of benchmarks for learning with noisy labels
and prompt reconsideration of the role of specialized learning algorithms
designed for training with noisy labels
- …