1,427 research outputs found

    Chasing Ghosts: Competing with Stateful Policies

    Full text link
    We consider sequential decision making in a setting where regret is measured with respect to a set of stateful reference policies, and feedback is limited to observing the rewards of the actions performed (the so called "bandit" setting). If either the reference policies are stateless rather than stateful, or the feedback includes the rewards of all actions (the so called "expert" setting), previous work shows that the optimal regret grows like Θ(T)\Theta(\sqrt{T}) in terms of the number of decision rounds TT. The difficulty in our setting is that the decision maker unavoidably loses track of the internal states of the reference policies, and thus cannot reliably attribute rewards observed in a certain round to any of the reference policies. In fact, in this setting it is impossible for the algorithm to estimate which policy gives the highest (or even approximately highest) total reward. Nevertheless, we design an algorithm that achieves expected regret that is sublinear in TT, of the form O(T/log1/4T)O( T/\log^{1/4}{T}). Our algorithm is based on a certain local repetition lemma that may be of independent interest. We also show that no algorithm can guarantee expected regret better than O(T/log3/2T)O( T/\log^{3/2} T)

    The phase transition in inhomogeneous random graphs

    Full text link
    We introduce a very general model of an inhomogenous random graph with independence between the edges, which scales so that the number of edges is linear in the number of vertices. This scaling corresponds to the p=c/n scaling for G(n,p) used to study the phase transition; also, it seems to be a property of many large real-world graphs. Our model includes as special cases many models previously studied. We show that under one very weak assumption (that the expected number of edges is `what it should be'), many properties of the model can be determined, in particular the critical point of the phase transition, and the size of the giant component above the transition. We do this by relating our random graphs to branching processes, which are much easier to analyze. We also consider other properties of the model, showing, for example, that when there is a giant component, it is `stable': for a typical random graph, no matter how we add or delete o(n) edges, the size of the giant component does not change by more than o(n).Comment: 135 pages; revised and expanded slightly. To appear in Random Structures and Algorithm

    Bounding Bloat in Genetic Programming

    Full text link
    While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat (unnecessary growth of solutions) slowing down optimization. Theoretical analyses could so far not bound bloat and required explicit assumptions on the magnitude of bloat. In this paper we analyze bloat in mutation-based genetic programming for the two test functions ORDER and MAJORITY. We overcome previous assumptions on the magnitude of bloat and give matching or close-to-matching upper and lower bounds for the expected optimization time. In particular, we show that the (1+1) GP takes (i) Θ(Tinit+nlogn)\Theta(T_{init} + n \log n) iterations with bloat control on ORDER as well as MAJORITY; and (ii) O(TinitlogTinit+n(logn)3)O(T_{init} \log T_{init} + n (\log n)^3) and Ω(Tinit+nlogn)\Omega(T_{init} + n \log n) (and Ω(TinitlogTinit)\Omega(T_{init} \log T_{init}) for n=1n=1) iterations without bloat control on MAJORITY.Comment: An extended abstract has been published at GECCO 201
    corecore