42 research outputs found

    The on-line shortest path problem under partial monitoring

    Get PDF
    The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a decision maker has to choose in each round of a game a path between two distinguished vertices such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be as small as possible. In a setting generalizing the multi-armed bandit problem, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this problem, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1/\sqrt{n} and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. An extension to the so-called label efficient setting is also given, in which the decision maker is informed about the weights of the edges corresponding to the chosen path at a total of m << n time instances. Another extension is shown where the decision maker competes against a time-varying path, a generalization of the problem of tracking the best expert. A version of the multi-armed bandit setting for shortest path is also discussed where the decision maker learns only the total weight of the chosen path but not the weights of the individual edges on the path. Applications to routing in packet switched networks along with simulation results are also presented.Comment: 35 page

    An efficient algorithm for learning with semi-bandit feedback

    Full text link
    We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

    First-order regret bounds for combinatorial semi-bandits

    Get PDF
    We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner's expected regret grows as O~(T)\widetilde{O}(\sqrt{T}) with the number of rounds TT. In this paper, we propose an algorithm that improves this scaling to O~(LT)\widetilde{O}(\sqrt{{L_T^*}}), where LTL_T^* is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201

    Online Multi-task Learning with Hard Constraints

    Get PDF
    We discuss multi-task online learning when a decision maker has to deal simultaneously with M tasks. The tasks are related, which is modeled by imposing that the M-tuple of actions taken by the decision maker needs to satisfy certain constraints. We give natural examples of such restrictions and then discuss a general class of tractable constraints, for which we introduce computationally efficient ways of selecting actions, essentially by reducing to an on-line shortest path problem. We briefly discuss "tracking" and "bandit" versions of the problem and extend the model in various ways, including non-additive global losses and uncountably infinite sets of tasks

    Új módszerek az adattömörítésben = New methods in data compression

    Get PDF
    Univerzális, kis késleltetésű kódokat terveztünk individuális sorozatok veszteséges tömörítésére, melyek ugyanolyan jó teljesítményt nyújtanak, mint a sorozathoz illesztett legjobb időben változó kód egy referenciaosztályból, mely az alkalmazott kódolási eljárást időről időre változtathatja. Hatékony, kis komplexitású implementációt készítettünk arra az esetre, amikor az alap-referenciaosztály a hagyományos vagy bizonyos hálózati skalárkvantálók osztálya. Új útvonalválasztási módszereket dolgoztunk ki kommunikációs hálózatokra, melyek aszimptotikusan ugyanolyan jó QoS (csomagvesztési arány, késleltetés) eredményt adnak, mint a változó hálózati környezethez (utólag) illesztett legjobb út. Kiemelendő, hogy a módszer teljesítménye és komplexitása időben optimális konvergenciasebesség mellett a hálózat méretével (és nem az utak számával) skálázik. Kísérletek szerint az elterjedt standard bájt-alapú tömörítő algoritmusok rosszul teljesítenek, ha a forrás nem bájt-alapú, ugyanakkor a bit-alapú módszerek jól működnek bájt-alapú forrásokra is (továbbá komplexitásuk - az alkalmazott kisebb ábécé miatt - gyakran lényegesen kisebb). Ezt a megfigyelést elméletileg is igazoltuk, megvizsgálva, hogy hogyan közelíthetőek blokk-Markov-források magasabb rendű szimbólum-alapú Markov-modellek segítségével. Megoldottuk a ládapakolási probléma egy szekvenciális, on-line változatát, mely alkalmazható bizonyos, kevés erőforrással rendelkező szenzorok hatékony adásütemezésére. | We designed limited-delay data compression methods that perform asymptotically as well as the best time-varying code from a reference family (matched to the source sequence in hindsight) that can change the employed base code several times. We provided efficient, low-complexity solutions for the cases when the base reference class is the set of traditional or certain network scalar quantizers. We developed routing algorithms for communication networks that can provide asymptotically as good QoS parameters (such as packet loss ratio or delay) as the best fixed path in the network matched to the varying conditions in hindsight. The performance and complexity of the developed methods scale with the size of the network (instead of with the number of paths) even when the rate of convergence (in time) is optimal. Experiments indicate that data for which bytes are not the natural choice of symbols compress poorly using standard byte-based implementations of lossless data compression algorithms, while algorithms working on a bit level perform reasonably on byte-based data (in addition to having computational advantages resulting from operating on a small alphabet). We explained this phenomenon by analyzing how block Markov sources can be approximated with symbol-based higher order Markov sources. We provided a solution to a sequential on-line version of the bin packing problem, which can be applied to schedule transmissions for certain sensors with limited resources

    Minimax Policies for Combinatorial Prediction Games

    Full text link
    We address the online linear optimization problem when the actions of the forecaster are represented by binary vectors. Our goal is to understand the magnitude of the minimax regret for the worst possible set of actions. We study the problem under three different assumptions for the feedback: full information, and the partial information models of the so-called "semi-bandit", and "bandit" problems. We consider both LL_\infty-, and L2L_2-type of restrictions for the losses assigned by the adversary. We formulate a general strategy using Bregman projections on top of a potential-based gradient descent, which generalizes the ones studied in the series of papers Gyorgy et al. (2007), Dani et al. (2008), Abernethy et al. (2008), Cesa-Bianchi and Lugosi (2009), Helmbold and Warmuth (2009), Koolen et al. (2010), Uchiya et al. (2010), Kale et al. (2010) and Audibert and Bubeck (2010). We provide simple proofs that recover most of the previous results. We propose new upper bounds for the semi-bandit game. Moreover we derive lower bounds for all three feedback assumptions. With the only exception of the bandit game, the upper and lower bounds are tight, up to a constant factor. Finally, we answer a question asked by Koolen et al. (2010) by showing that the exponentially weighted average forecaster is suboptimal against LL_{\infty} adversaries
    corecore