Search CORE

21,625 research outputs found

An efficient algorithm for learning with semi-bandit feedback

Author: A. György
A. Kalai
C. Allenberg
D. Suehiro
E. Takimoto
H.B. McMahan
J. Hannan
J. Poland
J.-Y. Audibert
N. Cesa-Bianchi
N. Cesa-Bianchi
P. Auer
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

arXiv.org e-Print Archive

Crossref

Best-of-Both-Worlds Algorithms for Partial Monitoring

Author: Honda Junya
Ito Shinji
Tsuchiya Taira
Publication venue
Publication date: 09/10/2022
Field of study

This study considers the partial monitoring problem with

k

-actions and

d

-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is

O(m^2 k^4 \log(T) \log(k_{\Pi} T) / \Delta_{\min})

in the stochastic regime and

O(m k^{2/3} \sqrt{T \log(T) \log k_{\Pi}})

in the adversarial regime, where

T

is the number of rounds,

m

is the maximum number of distinct observations per action,

\Delta_{\min}

is the minimum suboptimality gap, and

k_{\Pi}

is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is

O(c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)

in the stochastic regime and

O((c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T))^{1/3} T^{2/3})

in the adversarial regime, where

c_{\mathcal{G}}

is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.Comment: 31 page

arXiv.org e-Print Archive

Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method

Author: Anandkumar Anima
Barak Boaz
Bhaskara Aditya
Henrion Didier
Parrilo Pablo A
Shor NZ
Theodoros Evgeniou Andreas Argyriou
Publication venue
Publication date: 07/11/2014
Field of study

We give a new approach to the dictionary learning (also known as "sparse coding") problem of recovering an unknown

n\times m

matrix

A

(for

m \geq n

) from examples of the form

y = Ax + e,

where

x

is a random vector in

\mathbb R^m

with at most

\tau m

nonzero coordinates, and

e

is a random noise vector in

\mathbb R^n

with bounded magnitude. For the case

m=O(n)

, our algorithm recovers every column of

A

within arbitrarily good constant accuracy in time

m^{O(\log m/\log(\tau^{-1}))}

, in particular achieving polynomial time if

\tau = m^{-\delta}

for any

\delta>0

, and time

m^{O(\log m)}

\tau

is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector

x

to be much sparser---at most

\sqrt{n}

nonzero coordinates---and there were intrinsic barriers preventing these algorithms from applying for denser

x

. We achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor

T

, given access to a tensor

T'

that is

\tau

-close to

T

in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of

T

and

T'

have similar structures. Our algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Approximations for Throughput Maximization

Author: Hyatt-Denesik Dylan
Rahgoshay Mirmahdi
Salavatipour Mohammad R.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Algorithms and Computation (ISAAC 2020)
Publication date: 01/01/2020
Field of study

In this paper we study the classical problem of throughput maximization. In this problem we have a collection

J

n

jobs, each having a release time

r_j

, deadline

d_j

, and processing time

p_j

. They have to be scheduled non-preemptively on

m

identical parallel machines. The goal is to find a schedule which maximizes the number of jobs scheduled entirely in their

[r_j,d_j]

window. This problem has been studied extensively (even for the case of

m=1

). Several special cases of the problem remain open. Bar-Noy et al. [STOC1999] presented an algorithm with ratio

1-1/(1+1/m)^m

for

m

machines, which approaches

1-1/e

m

increases. For

m=1

, Chuzhoy-Ostrovsky-Rabani [FOCS2001] presented an algorithm with approximation with ratio

1-\frac{1}{e}-\varepsilon

(for any

\varepsilon>0

). Recently Im-Li-Moseley [IPCO2017] presented an algorithm with ratio

1-1/e-\varepsilon_0

for some absolute constant

\varepsilon_0>0

for any fixed

m

. They also presented an algorithm with ratio

1-O(\sqrt{\log m/m})-\varepsilon

for general

m

which approaches 1 as

m

grows. The approximability of the problem for

m=O(1)

remains a major open question. Even for the case of

m=1

and

c=O(1)

distinct processing times the problem is open (Sgall [ESA2012]). In this paper we study the case of

m=O(1)

and show that if there are

c

distinct processing times, i.e.

p_j

's come from a set of size

c

, then there is a

(1-\varepsilon)

-approximation that runs in time

O(n^{mc^7\varepsilon^{-6}}\log T)

, where

T

is the largest deadline. Therefore, for constant

m

and constant

c

this yields a PTAS. Our algorithm is based on proving structural properties for a near optimum solution that allows one to use a dynamic programming with pruning

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Stackelberg Network Pricing Games

Author: Briest Patrick
Hoefer Martin
Krysta Piotr
Publication venue
Publication date: 01/01/2007
Field of study

We study a multi-player one-round game termed Stackelberg Network Pricing Game, in which a leader can set prices for a subset of

m

priceable edges in a graph. The other edges have a fixed cost. Based on the leader's decision one or more followers optimize a polynomial-time solvable combinatorial minimization problem and choose a minimum cost solution satisfying their requirements based on the fixed costs and the leader's prices. The leader receives as revenue the total amount of prices paid by the followers for priceable edges in their solutions, and the problem is to find revenue maximizing prices. Our model extends several known pricing problems, including single-minded and unit-demand pricing, as well as Stackelberg pricing for certain follower problems like shortest path or minimum spanning tree. Our first main result is a tight analysis of a single-price algorithm for the single follower game, which provides a

(1+\epsilon) \log m

-approximation for any

\epsilon >0

. This can be extended to provide a

(1+\epsilon)(\log k + \log m)

-approximation for the general problem and

k

followers. The latter result is essentially best possible, as the problem is shown to be hard to approximate within \mathcal{O(\log^\epsilon k + \log^\epsilon m). If followers have demands, the single-price algorithm provides a

(1+\epsilon)m^2

-approximation, and the problem is hard to approximate within \mathcal{O(m^\epsilon) for some

\epsilon >0

. Our second main result is a polynomial time algorithm for revenue maximization in the special case of Stackelberg bipartite vertex cover, which is based on non-trivial max-flow and LP-duality techniques. Our results can be extended to provide constant-factor approximations for any constant number of followers

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Publikationsserver der RWTH Aachen University

Hal-Diderot

An Efficient Interior-Point Method for Online Convex Optimization

Author: Hazan Elad
Megiddo Nimrod
Publication venue
Publication date: 21/07/2023
Field of study

A new algorithm for regret minimization in online convex optimization is described. The regret of the algorithm after

T

time periods is

O(\sqrt{T \log T})

- which is the minimum possible up to a logarithmic term. In addition, the new algorithm is adaptive, in the sense that the regret bounds hold not only for the time periods

1,\ldots,T

but also for every sub-interval

s,s+1,\ldots,t

. The running time of the algorithm matches that of newly introduced interior point algorithms for regret minimization: in

n

-dimensional space, during each iteration the new algorithm essentially solves a system of linear equations of order

n

, rather than solving some constrained convex optimization problem in

n

dimensions and possibly many constraints

arXiv.org e-Print Archive