13 research outputs found

    Fast Rates in Online Convex Optimization by Exploiting the Curvature of Feasible Sets

    Full text link
    In this paper, we explore online convex optimization (OCO) and introduce a new analysis that provides fast rates by exploiting the curvature of feasible sets. In online linear optimization, it is known that if the average gradient of loss functions is larger than a certain value, the curvature of feasible sets can be exploited by the follow-the-leader (FTL) algorithm to achieve a logarithmic regret. This paper reveals that algorithms adaptive to the curvature of loss functions can also leverage the curvature of feasible sets. We first prove that if an optimal decision is on the boundary of a feasible set and the gradient of an underlying loss function is non-zero, then the algorithm achieves a regret upper bound of O(ρlogT)O(\rho \log T) in stochastic environments. Here, ρ>0\rho > 0 is the radius of the smallest sphere that includes the optimal decision and encloses the feasible set. Our approach, unlike existing ones, can work directly with convex loss functions, exploiting the curvature of loss functions simultaneously, and can achieve the logarithmic regret only with a local property of feasible sets. Additionally, it achieves an O(T)O(\sqrt{T}) regret even in adversarial environments where FTL suffers an Ω(T)\Omega(T) regret, and attains an O(ρlogT+CρlogT)O(\rho \log T + \sqrt{C \rho \log T}) regret bound in corrupted stochastic environments with corruption level CC. Furthermore, by extending our analysis, we establish a regret upper bound of O(Tq22(q1)(logT)q2(q1))O\Big(T^{\frac{q-2}{2(q-1)}} (\log T)^{\frac{q}{2(q-1)}}\Big) for qq-uniformly convex feasible sets, where uniformly convex sets include strongly convex sets and p\ell_p-balls for p[1,)p \in [1,\infty). This bound bridges the gap between the O(logT)O(\log T) regret bound for strongly convex sets (q=2q=2) and the O(T)O(\sqrt{T}) regret bound for non-curved sets (qq\to\infty).Comment: 17 page

    Perspectives on Incorporating Expert Feedback into Model Updates

    Full text link
    Machine learning (ML) practitioners are increasingly tasked with developing models that are aligned with non-technical experts' values and goals. However, there has been insufficient consideration on how practitioners should translate domain expertise into ML updates. In this paper, we consider how to capture interactions between practitioners and experts systematically. We devise a taxonomy to match expert feedback types with practitioner updates. A practitioner may receive feedback from an expert at the observation- or domain-level, and convert this feedback into updates to the dataset, loss function, or parameter space. We review existing work from ML and human-computer interaction to describe this feedback-update taxonomy, and highlight the insufficient consideration given to incorporating feedback from non-technical experts. We end with a set of open questions that naturally arise from our proposed taxonomy and subsequent survey

    Online Caching with no Regret: Optimistic Learning via Recommendations

    Full text link
    The design of effective online caching policies is an increasingly important problem for content distribution networks, online social networks and edge computing services, among other areas. This paper proposes a new algorithmic toolbox for tackling this problem through the lens of optimistic online learning. We build upon the Follow-the-Regularized-Leader (FTRL) framework, which is developed further here to include predictions for the file requests, and we design online caching algorithms for bipartite networks with fixed-size caches or elastic leased caches subject to time-average budget constraints. The predictions are provided by a content recommendation system that influences the users viewing activity and hence can naturally reduce the caching network's uncertainty about future requests. We also extend the framework to learn and utilize the best request predictor in cases where many are available. We prove that the proposed {optimistic} learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions, and maintain the sub-linear regret bound O(T)O(\sqrt T), which is the best achievable bound for policies that do not use predictions, even for arbitrary-bad predictions. The performance of the proposed algorithms is evaluated with detailed trace-driven numerical tests.Comment: arXiv admin note: substantial text overlap with arXiv:2202.1059

    Strong Convexity of Sets in Riemannian Manifolds

    Full text link
    Convex curvature properties are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various definitions of strong convexity for uniquely geodesic sets in a Riemannian manifold. We study their relationship, propose tools to determine the geodesic strongly convex nature of sets, and analyze the convergence of optimization algorithms over those sets. In particular, we demonstrate that the Riemannian Frank-Wolfe algorithm enjoys a global linear convergence rate when the Riemannian scaling inequalities hold

    Online Learning and Bandits with Queried Hints

    Get PDF
    We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number (kk) of choices has better reward (or loss) before making its choice. In this model, we derive algorithms whose regret bounds have exponentially better dependence on the time horizon compared to the classic regret bounds. In particular, we show that probing with k=2k=2 suffices to achieve time-independent regret bounds for online linear and convex optimization. The same number of probes improve the regret bound of stochastic MAB with independent arms from O(nT)O(\sqrt{nT}) to O(n2logT)O(n^2 \log T), where nn is the number of arms and TT is the horizon length. For stochastic MAB, we also consider a stronger model where a probe reveals the reward values of the probed arms, and show that in this case, k=3k=3 probes suffice to achieve parameter-independent constant regret, O(n2)O(n^2). Such regret bounds cannot be achieved even with full feedback after the play, showcasing the power of limited ``advice'' via probing before making the play. We also present extensions to the setting where the hints can be imperfect, and to the case of stochastic MAB where the rewards of the arms can be correlated.Comment: To appear in ITCS 202

    Optimistic No-regret Algorithms for Discrete Caching

    Full text link
    We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning, where the caching policy has access to a prediction oracle (provided by, e.g., a Neural Network). The successive file requests are assumed to be generated by an adversary, and no assumption is made on the accuracy of the oracle. In this setting, we provide a universal lower bound for prediction-assisted online caching and proceed to design a suite of policies with a range of performance-complexity trade-offs. All proposed policies offer sublinear regret bounds commensurate with the accuracy of the oracle. Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only O(T)O(\sqrt{T}) regret. In this pursuit, we design, to the best of our knowledge, the first comprehensive optimistic Follow-the-Perturbed leader policy, which generalizes beyond the caching problem. We also study the problem of caching files with different sizes and the bipartite network caching problem. Finally, we evaluate the efficacy of the proposed policies through extensive numerical experiments using real-world traces.Comment: Accepted to ACM SIGMETRICS 202
    corecore