9 research outputs found
Malleable Scheduling Beyond Identical Machines
In malleable job scheduling, jobs can be executed simultaneously on multiple machines with the processing time depending on the number of allocated machines. Jobs are required to be executed non-preemptively and in unison, in the sense that they occupy, during their execution, the same time interval over all the machines of the allocated set. In this work, we study generalizations of malleable job scheduling inspired by standard scheduling on unrelated machines. Specifically, we introduce a general model of malleable job scheduling, where each machine has a (possibly different) speed for each job, and the processing time of a job j on a set of allocated machines S depends on the total speed of S for j. For machines with unrelated speeds, we show that the optimal makespan cannot be approximated within a factor less than e/(e-1), unless P = NP. On the positive side, we present polynomial-time algorithms with approximation ratios 2e/(e-1) for machines with unrelated speeds, 3 for machines with uniform speeds, and 7/3 for restricted assignments on identical machines. Our algorithms are based on deterministic LP rounding and result in sparse schedules, in the sense that each machine shares at most one job with other machines. We also prove lower bounds on the integrality gap of 1+phi for unrelated speeds (phi is the golden ratio) and 2 for uniform speeds and restricted assignments. To indicate the generality of our approach, we show that it also yields constant factor approximation algorithms (i) for minimizing the sum of weighted completion times; and (ii) a variant where we determine the effective speed of a set of allocated machines based on the L_p norm of their speeds
Last Switch Dependent Bandits with Monotone Payoff Functions
In a recent work, Laforgue et al. introduce the model of last switch
dependent (LSD) bandits, in an attempt to capture nonstationary phenomena
induced by the interaction between the player and the environment. Examples
include satiation, where consecutive plays of the same action lead to decreased
performance, or deprivation, where the payoff of an action increases after an
interval of inactivity. In this work, we take a step towards understanding the
approximability of planning LSD bandits, namely, the (NP-hard) problem of
computing an optimal arm-pulling strategy under complete knowledge of the
model. In particular, we design the first efficient constant approximation
algorithm for the problem and show that, under a natural monotonicity
assumption on the payoffs, its approximation guarantee (almost) matches the
state-of-the-art for the special and well-studied class of recharging bandits
(also known as delay-dependent). In this attempt, we develop new tools and
insights for this class of problems, including a novel higher-dimensional
relaxation and the technique of mirroring the evolution of virtual states. We
believe that these novel elements could potentially be used for approaching
richer classes of action-induced nonstationary bandits (e.g., special instances
of restless bandits). In the case where the model parameters are initially
unknown, we develop an online learning adaptation of our algorithm for which we
provide sublinear regret guarantees against its full-information counterpart.Comment: Accepted to the 40th International Conference on Machine Learning
(ICML 2023
Recommended from our members
Online decision-making and learning under structured non-stationarity
Many well-studied online decision-making and learning models rely on the assumption that the environment remains intact and unaffected by the agent’s actions. This assumption, however, can be considered somewhat unrealistic for several real-life applications. In recommendation systems, for instance, the preferences of a particular user (and, thus, the “best” products to recommend) can change over time. This change can occur either naturally (e.g., the user discovers a new category of desired products) or as a response to past recommendations (e.g., incessantly recommended products can be perceived as “spam” and, thus, be ignored in the future). In this dissertation, we consider generalizations of multi-armed bandit and optimal-stopping settings, in which the payoff distribution of each action is a function of the number of rounds passed since its last play. We study such generalizations in the planning regime, where knowledge of the prior distributions is assumed, and the learning regime, where priors are learned online using the collected samples.Computer Science
Single-Sample Prophet Inequalities via Greedy-Ordered Selection
We study single-sample prophet inequalities (SSPIs), i.e., prophet inequalities where only a single sample from each prior distribution is available. Besides a direct, and optimal, SSPI for the basic single choice problem [Rubinstein et al., 2020], most existing SSPI results were obtained via an elegant, but inherently lossy reduction to order-oblivious secretary (OOS) policies [Azar et al., 2014]. Motivated by this discrepancy, we develop an intuitive and versatile greedy-based technique that yields SSPIs directly rather than through the reduction to OOSs. Our results can be seen as generalizing and unifying a number of existing results in the area of prophet and secretary problems. Our algorithms significantly improve on the competitive guarantees for a number of interesting scenarios (including general matching with edge arrivals, bipartite matching with vertex arrivals, and certain matroids), and capture new settings (such as budget additive combinatorial auctions). Complementing our algorithmic results, we also consider mechanism design variants. Finally, we analyze the power and limitations of different SSPI approaches by providing a partial converse to the reduction from SSPI to OOS given by Azar et al