2 research outputs found
Fast Large-Scale Discrete Optimization Based on Principal Coordinate Descent
Binary optimization, a representative subclass of discrete optimization,
plays an important role in mathematical optimization and has various
applications in computer vision and machine learning. Usually, binary
optimization problems are NP-hard and difficult to solve due to the binary
constraints, especially when the number of variables is very large. Existing
methods often suffer from high computational costs or large accumulated
quantization errors, or are only designed for specific tasks. In this paper, we
propose a fast algorithm to find effective approximate solutions for general
binary optimization problems. The proposed algorithm iteratively solves
minimization problems related to the linear surrogates of loss functions, which
leads to the updating of some binary variables most impacting the value of loss
functions in each step. Our method supports a wide class of empirical objective
functions with/without restrictions on the numbers of s and s in the
binary variables. Furthermore, the theoretical convergence of our algorithm is
proven, and the explicit convergence rates are derived, for objective functions
with Lipschitz continuous gradients, which are commonly adopted in practice.
Extensive experiments on several binary optimization tasks and large-scale
datasets demonstrate the superiority of the proposed algorithm over several
state-of-the-art methods in terms of both effectiveness and efficiency.Comment: 14 page
Differentiable Architecture Pruning for Transfer Learning
We propose a new gradient-based approach for extracting sub-architectures
from a given large model. Contrarily to existing pruning methods, which are
unable to disentangle the network architecture and the corresponding weights,
our architecture-pruning scheme produces transferable new structures that can
be successfully retrained to solve different tasks. We focus on a
transfer-learning setup where architectures can be trained on a large data set
but very few data points are available for fine-tuning them on new tasks. We
define a new gradient-based algorithm that trains architectures of arbitrarily
low complexity independently from the attached weights. Given a search space
defined by an existing large neural model, we reformulate the architecture
search task as a complexity-penalized subset-selection problem and solve it
through a two-temperature relaxation scheme. We provide theoretical convergence
guarantees and validate the proposed transfer-learning strategy on real data.Comment: 19 pages (main + appendix), 7 figures and 1 table, Workshop @ ICML
2021, 24th July 202