10 research outputs found

    The Complexity of POMDPs with Long-run Average Objectives

    Full text link
    We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete

    Prophet Inequalities: Separating Random Order from Order Selection

    Full text link
    Prophet inequalities are a central object of study in optimal stopping theory. A gambler is sent values online, sampled from an instance of independent distributions, in an adversarial, random or selected order, depending on the model. When observing each value, the gambler either accepts it as a reward or irrevocably rejects it and proceeds to observe the next value. The goal of the gambler, who cannot see the future, is maximising the expected value of the reward while competing against the expectation of a prophet (the offline maximum). In other words, one seeks to maximise the gambler-to-prophet ratio of the expectations. The model, in which the gambler selects the arrival order first, and then observes the values, is known as Order Selection. Recently it has been shown that in this model a ratio of 0.72510.7251 can be attained for any instance. If the gambler chooses the arrival order (uniformly) at random, we obtain the Random Order model. The worst case ratio over all possible instances has been extensively studied for at least 4040 years. Still, it is not known if carefully choosing the order, or simply taking it at random, benefits the gambler. We prove that, in the Random Order model, no algorithm can achieve a ratio larger than 0.72350.7235, thus showing for the first time that there is a real benefit in choosing the order.Comment: 36 pages, 2 figure

    Prophet inequality through schur-convexity and optimal control

    No full text
    Tesis para optar al grado de Magíster en Ciencias de la Ingeniería, Mención Matemáticas AplicadasMemoria para optar al título de Ingeniero Civil MatemáticoEn el clásico problema de tiempo de parada óptimo conocido como Desigualdad del profeta realizaciones de variables positivas e independientes son descubiertas secuencialmente. Una jugadora que conoce las distribuciones, pero no puede ver en el futuro, debe decidir cuándo parar y tomar la última variable revelada. Su objetivo es maximizar la esperanza de lo obtenido y su rendimiento está dado por el peor caso del cociente entre la esperanza de que obtiene y la esperanza de lo que obtendría un profeta (que puede ver en el futuro y así siempre elegir el máximo). En los setenta, Krengel y Sucheston, y Garling, [16] determinaron que el rendimiento de una jugadora puede ser una constante y que 1/2 es la mejor constante. En la última década, la desigualdad del profeta ha resurgido como un problema importante dada su conexión con "Posted Price Mechanisms", una teoría usada en ventas en línea. Una variante de particular interés es "Prophet Secretary", donde la única diferencia es que las relaciones son descubiertas en orden aleatorio. Para esta variante, varios algoritmos logran un rendimiento de 1 − 1/e ≈ 0.63 y recientemente Azar et al. [2] mejoraron este resultado. En cuanto a cotas superiores, se sabe que una jugadora no puede hacerlo mejor que 0.745, en el límite sobre el tamaño de la instancia. En esta tesis se deriva una forma de analizar estrategias que dependen sólo del tiempo: dada una instancia, se calcula una secuencia decreciente de exigencias que son usadas para decidir si parar o no. La jugadora tomará el primer valor que supere la exigencia correspondiente al momento en que fue descubierta. Específicamente, se considera una clase robusta de estrategias que denominamos "blind strategies". Constituyen una generalización de fijar una sola exigencia para todo el proceso y consisten en fijar una función, independiente de la instancia, que determina cómo calcular las exigencias una vez la instancia es conocida. El resultado principal es que la jugadora logra un rendimiento de al menos 0.669, superando el estado del arte (Azar et al. [2]) tanto para "Prophet Secretary" como para la variante en la que la jugadora tiene la libertad de escoger el orden en que descubre las variables (Beyhaghi et al [3]). El análisis se reduce a estudiar la distribución del tiempo de parada inducido por estas estrategias, a través de la teoría de Schur-convexidad. También, se demuestra que este tipo de estrategias no pueden lograr más que 0.675, a través de calcular el rendimiento óptimo de la jugadora contra dos instancias particulares, resolviendo un problema de control óptimo. Finalmente, se demuestra que el conjunto más amplio de estrategias no adaptativas no pueden lograr más de √3 − 1 ≈ 0.73, cota que también mejora el estado del arte en cotas superiores para estrategias simples (Azar et al [2]). Se considera una estrategia como no adaptativa si al decisión de parar depende del valor, la identidad y el tiempo en que fue descubierta la variable, pero no toma en cuenta la identidad de las variables anteriores.CONICYT-Chile, ECOS-CONICYT, Google y CMM - Conicyt PIA AFB17000

    The Complexity of POMDPs with Long-run Average Objectives

    No full text
    We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete

    Relation between the number of peaks and the number of reciprocal sign epistatic interactions

    Get PDF
    Empirical essays of fitness landscapes suggest that they may be rugged, that is having multiple fitness peaks. Such fitness landscapes, those that have multiple peaks, necessarily have special local structures, called reciprocal sign epistasis (Poelwijk et al. in J Theor Biol 272:141–144, 2011). Here, we investigate the quantitative relationship between the number of fitness peaks and the number of reciprocal sign epistatic interactions. Previously, it has been shown (Poelwijk et al. in J Theor Biol 272:141–144, 2011) that pairwise reciprocal sign epistasis is a necessary but not sufficient condition for the existence of multiple peaks. Applying discrete Morse theory, which to our knowledge has never been used in this context, we extend this result by giving the minimal number of reciprocal sign epistatic interactions required to create a given number of peaks

    Repeated prophet inequality with near-optimal bounds

    No full text
    In modern sample-driven Prophet Inequality, an adversary chooses a sequence of n items with values v1,v2,…,vn to be presented to a decision maker (DM). The process follows in two phases. In the first phase (sampling phase), some items, possibly selected at random, are revealed to the DM, but she can never accept them. In the second phase, the DM is presented with the other items in a random order and online fashion. For each item, she must make an irrevocable decision to either accept the item and stop the process or reject the item forever and proceed to the next item. The goal of the DM is to maximize the expected value as compared to a Prophet (or offline algorithm) that has access to all information. In this setting, the sampling phase has no cost and is not part of the optimization process. However, in many scenarios, the samples are obtained as part of the decision-making process. We model this aspect as a two-phase Prophet Inequality where an adversary chooses a sequence of 2n items with values v1,v2,…,v2n and the items are randomly ordered. Finally, there are two phases of the Prophet Inequality problem with the first n-items and the rest of the items, respectively. We show that some basic algorithms achieve a ratio of at most 0.450. We present an algorithm that achieves a ratio of at least 0.495. Finally, we show that for every algorithm the ratio it can achieve is at most 0.502. Hence our algorithm is near-optimal

    Finite-memory strategies in POMDPs with long-run average objectives

    No full text
    Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function

    Faster algorithm for turn-based stochastic games with bounded treewidth

    No full text
    Turn-based stochastic games (aka simple stochastic games) are two-player zero-sum games played on directed graphs with probabilistic transitions. The goal of player-max is to maximize the probability to reach a target state against the adversarial player-min. These games lie in NP ∩ coNP and are among the rare combinatorial problems that belong to this complexity class for which the existence of polynomial-time algorithm is a major open question. While randomized sub-exponential time algorithm exists, all known deterministic algorithms require exponential time in the worst-case. An important open question has been whether faster algorithms can be obtained parametrized by the treewidth of the game graph. Even deterministic sub-exponential time algorithm for constant treewidth turn-based stochastic games has remain elusive. In this work our main result is a deterministic algorithm to solve turn-based stochastic games that, given a game with n states, treewidth at most t, and the bit-complexity of the probabilistic transition function log D, has running time O ((tn2 log D)t log n). In particular, our algorithm is quasi-polynomial time for games with constant or poly-logarithmic treewidth
    corecore