184,754 research outputs found

    Optimal Bounds for the Change-Making Problem

    Get PDF
    The change-making problem is the problem of representing a given value with the fewest coins possible. We investigate the problem of determining whether the greedy algorithm produces an optimal representation of all amounts for a given set of coin denominations 1 = c_1 < c_2 < ... < c_m. Chang and Gill show that if the greedy algorithm is not always optimal, then there exists a counterexample x in the rangec_3 <= x < (c_m(c_m c_m-1 + c_m - 3c_m-1)) \ (c_m - c_m-1).To test for the existence of such a counterexample, Chang and Gill propose computing and comparing the greedy and optimal representations of all x in this range. In this paper we show that if a counterexample exists, then the smallest one lies in the range c_3 + 1 < x < c_m + c_m-1, and these bounds are tight. Moreover, we give a simple test for the existence of a counterexample that does not require the calculation of optimal representations.In addition, we give a complete characterization of three-coin systems and an efficient algorithm for all systems with a fixed number of coins. Finally, we show that a related problem is coNP-complete

    Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

    Full text link
    We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action. Compared with the linear model, two curious phenomena arise in non-linear models: first, in addition to the "learning phase" with a standard parametric rate for estimation or regret, there is an "burn-in period" with a fixed cost determined by the non-linear function; second, achieving the smallest burn-in cost requires new exploration algorithms. For a special family of non-linear functions named ridge functions in the literature, we derive upper and lower bounds on the optimal burn-in cost, and in addition, on the entire learning trajectory during the burn-in period via differential equations. In particular, a two-stage algorithm that first finds a good initial action and then treats the problem as locally linear is statistically optimal. In contrast, several classical algorithms, such as UCB and algorithms relying on regression oracles, are provably suboptimal.Comment: Title change; add a new lower bound for linear bandits in Theorem 1

    Metareasoning for Planning Under Uncertainty

    Full text link
    The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most real-world settings. For example, an autonomous drone is subject to nature's forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning---a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.Comment: Extended version of IJCAI 2015 pape

    Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

    Get PDF
    Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015
    • …
    corecore