50 research outputs found

    Learning to Generalize Provably in Learning to Optimize

    Full text link
    Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.Comment: This paper is accepted in AISTATS 202

    R\'enyi Fair Inference

    Full text link
    Machine learning algorithms have been increasingly deployed in critical automated decision-making systems that directly affect human lives. When these algorithms are only trained to minimize the training/test error, they could suffer from systematic discrimination against individuals based on their sensitive attributes such as gender or race. Recently, there has been a surge in machine learning society to develop algorithms for fair machine learning. In particular, many adversarial learning procedures have been proposed to impose fairness. Unfortunately, these algorithms either can only impose fairness up to first-order dependence between the variables, or they lack computational convergence guarantees. In this paper, we use R\'enyi correlation as a measure of fairness of machine learning models and develop a general training framework to impose fairness. In particular, we propose a min-max formulation which balances the accuracy and fairness when solved to optimality. For the case of discrete sensitive attributes, we suggest an iterative algorithm with theoretical convergence guarantee for solving the proposed min-max problem. Our algorithm and analysis are then specialized to fair classification and the fair clustering problem under disparate impact doctrine. Finally, the performance of the proposed R\'enyi fair inference framework is evaluated on Adult and Bank datasets.Comment: 11 pages, 1 figur

    Closed-loop automatic gradient design for liquid chromatography using Bayesian optimization

    Get PDF
    Contemporary complex samples require sophisticated methods for full analysis. This work describes the development of a Bayesian optimization algorithm for automated and unsupervised development of gradient programs. The algorithm was tailored to LC using a Gaussian process model with a novel covariance kernel. To facilitate unsupervised learning, the algorithm was designed to interface directly with the chromatographic system. Single-objective and multi-objective Bayesian optimization strategies were investigated for the separation of two complex (n>18, and n>80) dye mixtures. Both approaches found satisfactory optima in under 35 measurements. The multi-objective strategy was found to be powerful and flexible in terms of exploring the Pareto front. The performance difference between the single-objective and multi-objective strategy was further investigated using a retention modeling example. One additional advantage of the multi-objective approach was that it allows for a trade-off to be made between multiple objectives without prior knowledge. In general, the Bayesian optimization strategy was found to be particularly suitable, but not limited to, cases where retention modelling is not possible, although its scalability might be limited in terms of the number of parameters that can be simultaneously optimized

    Principled Architecture-aware Scaling of Hyperparameters

    Full text link
    Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. Current works try to automatically optimize or design principles of hyperparameters, such that they can generalize to diverse unseen scenarios. However, most designs or optimization methods are agnostic to the choice of network structures, and thus largely ignore the impact of neural architectures on hyperparameters. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture, which includes the network depth, width, convolutional kernel size, and connectivity patterns. By pursuing every parameter to be maximally updated with the same mean squared change in pre-activations, we can generalize our initialization and learning rates across MLPs (multi-layer perception) and CNNs (convolutional neural network) with sophisticated graph topologies. We verify our principles with comprehensive experiments. More importantly, our strategy further sheds light on advancing current benchmarks for architecture design. A fair comparison of AutoML algorithms requires accurate network rankings. However, we demonstrate that network rankings can be easily changed by better training networks in benchmarks with our architecture-aware learning rates and initialization
    corecore