50 research outputs found
Learning to Generalize Provably in Learning to Optimize
Learning to optimize (L2O) has gained increasing popularity, which automates
the design of optimizers by data-driven approaches. However, current L2O
methods often suffer from poor generalization performance in at least two
folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of
lowering their loss function values (optimizer generalization, or
``generalizable learning of optimizers"); and (ii) the test performance of an
optimizee (itself as a machine learning model), trained by the optimizer, in
terms of the accuracy over unseen data (optimizee generalization, or ``learning
to generalize"). While the optimizer generalization has been recently studied,
the optimizee generalization (or learning to generalize) has not been
rigorously studied in the L2O context, which is the aim of this paper. We first
theoretically establish an implicit connection between the local entropy and
the Hessian, and hence unify their roles in the handcrafted design of
generalizable optimizers as equivalent metrics of the landscape flatness of
loss functions. We then propose to incorporate these two metrics as
flatness-aware regularizers into the L2O framework in order to meta-train
optimizers to learn to generalize, and theoretically show that such
generalization ability can be learned during the L2O meta-training process and
then transformed to the optimizee loss function. Extensive experiments
consistently validate the effectiveness of our proposals with substantially
improved generalization on multiple sophisticated L2O models and diverse
optimizees. Our code is available at:
https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.Comment: This paper is accepted in AISTATS 202
R\'enyi Fair Inference
Machine learning algorithms have been increasingly deployed in critical
automated decision-making systems that directly affect human lives. When these
algorithms are only trained to minimize the training/test error, they could
suffer from systematic discrimination against individuals based on their
sensitive attributes such as gender or race. Recently, there has been a surge
in machine learning society to develop algorithms for fair machine learning. In
particular, many adversarial learning procedures have been proposed to impose
fairness. Unfortunately, these algorithms either can only impose fairness up to
first-order dependence between the variables, or they lack computational
convergence guarantees. In this paper, we use R\'enyi correlation as a measure
of fairness of machine learning models and develop a general training framework
to impose fairness. In particular, we propose a min-max formulation which
balances the accuracy and fairness when solved to optimality. For the case of
discrete sensitive attributes, we suggest an iterative algorithm with
theoretical convergence guarantee for solving the proposed min-max problem. Our
algorithm and analysis are then specialized to fair classification and the fair
clustering problem under disparate impact doctrine. Finally, the performance of
the proposed R\'enyi fair inference framework is evaluated on Adult and Bank
datasets.Comment: 11 pages, 1 figur
Closed-loop automatic gradient design for liquid chromatography using Bayesian optimization
Contemporary complex samples require sophisticated methods for full analysis. This work describes the development of a Bayesian optimization algorithm for automated and unsupervised development of gradient programs. The algorithm was tailored to LC using a Gaussian process model with a novel covariance kernel. To facilitate unsupervised learning, the algorithm was designed to interface directly with the chromatographic system. Single-objective and multi-objective Bayesian optimization strategies were investigated for the separation of two complex (n>18, and n>80) dye mixtures. Both approaches found satisfactory optima in under 35 measurements. The multi-objective strategy was found to be powerful and flexible in terms of exploring the Pareto front. The performance difference between the single-objective and multi-objective strategy was further investigated using a retention modeling example. One additional advantage of the multi-objective approach was that it allows for a trade-off to be made between multiple objectives without prior knowledge. In general, the Bayesian optimization strategy was found to be particularly suitable, but not limited to, cases where retention modelling is not possible, although its scalability might be limited in terms of the number of parameters that can be simultaneously optimized
Principled Architecture-aware Scaling of Hyperparameters
Training a high-quality deep neural network requires choosing suitable
hyperparameters, which is a non-trivial and expensive process. Current works
try to automatically optimize or design principles of hyperparameters, such
that they can generalize to diverse unseen scenarios. However, most designs or
optimization methods are agnostic to the choice of network structures, and thus
largely ignore the impact of neural architectures on hyperparameters. In this
work, we precisely characterize the dependence of initializations and maximal
learning rates on the network architecture, which includes the network depth,
width, convolutional kernel size, and connectivity patterns. By pursuing every
parameter to be maximally updated with the same mean squared change in
pre-activations, we can generalize our initialization and learning rates across
MLPs (multi-layer perception) and CNNs (convolutional neural network) with
sophisticated graph topologies. We verify our principles with comprehensive
experiments. More importantly, our strategy further sheds light on advancing
current benchmarks for architecture design. A fair comparison of AutoML
algorithms requires accurate network rankings. However, we demonstrate that
network rankings can be easily changed by better training networks in
benchmarks with our architecture-aware learning rates and initialization