149 research outputs found

    Optimal Release of Inventory Using Online Auctions: The Two Item Case

    Get PDF
    In this paper we analyze policies for optimally disposing inventory using online auctions. We assume a seller has a ïŹxed number of items to sell using a sequence of, possibly overlapping, single-item auctions. The decision the seller must make is when to start each auction. The decision involves a trade-oïŹ€ between a holding cost for each period an item remains unsold, and a higher expected ïŹnal price the fewer the number of simultaneous auctions underway. Consequently the seller must trade-oïŹ€ the expected marginal gain for the ongoing auctions with the expected marginal cost of the unreleased items by further deferring their release. We formulate the problem as a discrete time Markov Decision Problem and consider two cases. In the ïŹrst case we assume the auctions are guaranteed to be successful, while in the second case we assume there is a positive probability that an auction receives no bids. The reason for considering these two cases are that they require diïŹ€erent analysis. We derive conditions to ensure that the optimal release policy is a control limit policy in the current price of the ongoing auctions, and provide several illustration of results. The paper focuses on the two item case which has suïŹƒcient complexity to raise challenging questions

    Learning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in Football

    Full text link
    For decades, National Football League (NFL) coaches' observed fourth down decisions have been largely inconsistent with prescriptions based on statistical models. In this paper, we develop a framework to explain this discrepancy using a novel inverse optimization approach. We model the fourth down decision and the subsequent sequence of plays in a game as a Markov decision process (MDP), the dynamics of which we estimate from NFL play-by-play data from the 2014 through 2022 seasons. We assume that coaches' observed decisions are optimal but that the risk preferences governing their decisions are unknown. This yields a novel inverse decision problem for which the optimality criterion, or risk measure, of the MDP is the estimand. Using the quantile function to parameterize risk, we estimate which quantile-optimal policy yields the coaches' observed decisions as minimally suboptimal. In general, we find that coaches' fourth-down behavior is consistent with optimizing low quantiles of the next-state value distribution, which corresponds to conservative risk preferences. We also find that coaches exhibit higher risk tolerances when making decisions in the opponent's half of the field than in their own, and that league average fourth down risk tolerances have increased over the seasons in our data.Comment: 33 pages, 9 figure

    Optimal Strategies in Infinite-state Stochastic Reachability Games

    Full text link
    We consider perfect-information reachability stochastic games for 2 players on infinite graphs. We identify a subclass of such games, and prove two interesting properties of it: first, Player Max always has optimal strategies in games from this subclass, and second, these games are strongly determined. The subclass is defined by the property that the set of all values can only have one accumulation point -- 0. Our results nicely mirror recent results for finitely-branching games, where, on the contrary, Player Min always has optimal strategies. However, our proof methods are substantially different, because the roles of the players are not symmetric. We also do not restrict the branching of the games. Finally, we apply our results in the context of recently studied One-Counter stochastic games

    Decision Problems for Nash Equilibria in Stochastic Games

    Get PDF
    We analyse the computational complexity of finding Nash equilibria in stochastic multiplayer games with ω\omega-regular objectives. While the existence of an equilibrium whose payoff falls into a certain interval may be undecidable, we single out several decidable restrictions of the problem. First, restricting the search space to stationary, or pure stationary, equilibria results in problems that are typically contained in PSPACE and NP, respectively. Second, we show that the existence of an equilibrium with a binary payoff (i.e. an equilibrium where each player either wins or loses with probability 1) is decidable. We also establish that the existence of a Nash equilibrium with a certain binary payoff entails the existence of an equilibrium with the same payoff in pure, finite-state strategies.Comment: 22 pages, revised versio

    Solving Stochastic B\"uchi Games on Infinite Arenas with a Finite Attractor

    Full text link
    We consider games played on an infinite probabilistic arena where the first player aims at satisfying generalized B\"uchi objectives almost surely, i.e., with probability one. We provide a fixpoint characterization of the winning sets and associated winning strategies in the case where the arena satisfies the finite-attractor property. From this we directly deduce the decidability of these games on probabilistic lossy channel systems.Comment: In Proceedings QAPL 2013, arXiv:1306.241

    Computing Distances between Probabilistic Automata

    Full text link
    We present relaxed notions of simulation and bisimulation on Probabilistic Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve the usual notions of bisimulation and simulation on PAs. We give logical characterisations of these notions by choosing suitable logics which differ from the elementary ones, L with negation and L without negation, by the modal operator. Using flow networks, we show how to compute the relations in PTIME. This allows the definition of an efficiently computable non-discounted distance between the states of a PA. A natural modification of this distance is introduced, to obtain a discounted distance, which weakens the influence of long term transitions. We compare our notions of distance to others previously defined and illustrate our approach on various examples. We also show that our distance is not expansive with respect to process algebra operators. Although L without negation is a suitable logic to characterise epsilon-(bi)simulation on deterministic PAs, it is not for general PAs; interestingly, we prove that it does characterise weaker notions, called a priori epsilon-(bi)simulation, which we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074

    Approximating the termination value of one-counter MDPs and stochastic games

    Get PDF
    One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently, we studied qualitative decision problems ("is the optimal termination value equal to 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in polynomial time (in NP intersection coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value at least p", or "approximate the termination value within epsilon") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given an OC-SSG, and given epsilon>0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error epsilon, and furthermore we can compute epsilon-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain linear programs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes epsilon-optimal for OC-MDPs
    • 

    corecore