3,653 research outputs found

    Oracle complexity separation in convex optimization

    Get PDF
    Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part

    Oracle Complexity Separation in Convex Optimization

    Full text link
    Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods and the number of oracle calls required to achieve given accuracy. Motivated by the desire to call more expensive oracle less number of times, in this paper we consider minimization of a sum of two functions and propose a generic algorithmic framework to separate oracle complexities for each component in the sum. As a specific example, for the μ\mu-strongly convex problem minxRnh(x)+g(x)\min_{x\in \mathbb{R}^n} h(x) + g(x) with LhL_h-smooth function hh and LgL_g-smooth function gg, a special case of our algorithm requires, up to a logarithmic factor, O(Lh/μ)O(\sqrt{L_h/\mu}) first-order oracle calls for hh and O(Lg/μ)O(\sqrt{L_g/\mu}) first-order oracle calls for gg. Our general framework covers also the setting of strongly convex objectives, the setting when gg is given by coordinate derivative oracle, and the setting when gg has a finite-sum structure and is available through stochastic gradient oracle. In the latter two cases we obtain respectively accelerated random coordinate descent and accelerated variance reduction methods with oracle complexity separation

    Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes

    Full text link
    We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius ϵ\epsilon with a separation oracle in dimension dd -- or to minimize 11-Lipschitz convex functions to accuracy ϵ\epsilon over the unit ball -- our algorithms use O(d2pln1ϵ)\mathcal O(\frac{d^2}{p}\ln \frac{1}{\epsilon}) bits of memory, and make O((Cdpln1ϵ)p)\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})^p) oracle calls, for some universal constant C1C \geq 1. The family is parametrized by p[d]p\in[d] and provides an oracle-complexity/memory trade-off in the sub-polynomial regime ln1ϵlnd\ln\frac{1}{\epsilon}\gg\ln d. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with ln1ϵ\ln\frac{1}{\epsilon}, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with ϵ1/d\epsilon\leq 1/\sqrt d. The algorithms divide the dd variables into pp blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime ϵdΩ(d)\epsilon \leq d^{-\Omega(d)}, our algorithm with p=dp=d achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent

    Reducing Revenue to Welfare Maximization: Approximation Algorithms and other Generalizations

    Get PDF
    It was recently shown in [http://arxiv.org/abs/1207.5518] that revenue optimization can be computationally efficiently reduced to welfare optimization in all multi-dimensional Bayesian auction problems with arbitrary (possibly combinatorial) feasibility constraints and independent additive bidders with arbitrary (possibly combinatorial) demand constraints. This reduction provides a poly-time solution to the optimal mechanism design problem in all auction settings where welfare optimization can be solved efficiently, but it is fragile to approximation and cannot provide solutions to settings where welfare maximization can only be tractably approximated. In this paper, we extend the reduction to accommodate approximation algorithms, providing an approximation preserving reduction from (truthful) revenue maximization to (not necessarily truthful) welfare maximization. The mechanisms output by our reduction choose allocations via black-box calls to welfare approximation on randomly selected inputs, thereby generalizing also our earlier structural results on optimal multi-dimensional mechanisms to approximately optimal mechanisms. Unlike [http://arxiv.org/abs/1207.5518], our results here are obtained through novel uses of the Ellipsoid algorithm and other optimization techniques over {\em non-convex regions}

    A new Lenstra-type Algorithm for Quasiconvex Polynomial Integer Minimization with Complexity 2^O(n log n)

    Full text link
    We study the integer minimization of a quasiconvex polynomial with quasiconvex polynomial constraints. We propose a new algorithm that is an improvement upon the best known algorithm due to Heinz (Journal of Complexity, 2005). This improvement is achieved by applying a new modern Lenstra-type algorithm, finding optimal ellipsoid roundings, and considering sparse encodings of polynomials. For the bounded case, our algorithm attains a time-complexity of s (r l M d)^{O(1)} 2^{2n log_2(n) + O(n)} when M is a bound on the number of monomials in each polynomial and r is the binary encoding length of a bound on the feasible region. In the general case, s l^{O(1)} d^{O(n)} 2^{2n log_2(n) +O(n)}. In each we assume d>= 2 is a bound on the total degree of the polynomials and l bounds the maximum binary encoding size of the input.Comment: 28 pages, 10 figure
    corecore