152,136 research outputs found

    Deterministic Graph Exploration with Advice

    Get PDF
    We consider the task of graph exploration. An nn-node graph has unlabeled nodes, and all ports at any node of degree dd are arbitrarily numbered 0,,d10,\dots, d-1. A mobile agent has to visit all nodes and stop. The exploration time is the number of edge traversals. We consider the problem of how much knowledge the agent has to have a priori, in order to explore the graph in a given time, using a deterministic algorithm. This a priori information (advice) is provided to the agent by an oracle, in the form of a binary string, whose length is called the size of advice. We consider two types of oracles. The instance oracle knows the entire instance of the exploration problem, i.e., the port-numbered map of the graph and the starting node of the agent in this map. The map oracle knows the port-numbered map of the graph but does not know the starting node of the agent. We first consider exploration in polynomial time, and determine the exact minimum size of advice to achieve it. This size is logloglognΘ(1)\log\log\log n -\Theta(1), for both types of oracles. When advice is large, there are two natural time thresholds: Θ(n2)\Theta(n^2) for a map oracle, and Θ(n)\Theta(n) for an instance oracle, that can be achieved with sufficiently large advice. We show that, with a map oracle, time Θ(n2)\Theta(n^2) cannot be improved in general, regardless of the size of advice. We also show that the smallest size of advice to achieve this time is larger than nδn^\delta, for any δ<1/3\delta <1/3. For an instance oracle, advice of size O(nlogn)O(n\log n) is enough to achieve time O(n)O(n). We show that, with any advice of size o(nlogn)o(n\log n), the time of exploration must be at least nϵn^\epsilon, for any ϵ<2\epsilon <2, and with any advice of size O(n)O(n), the time must be Ω(n2)\Omega(n^2). We also investigate minimum advice sufficient for fast exploration of hamiltonian graphs

    Universal Learning of Repeated Matrix Games

    Full text link
    We study and compare the learning dynamics of two universal learning algorithms, one based on Bayesian learning and the other on prediction with expert advice. Both approaches have strong asymptotic performance guarantees. When confronted with the task of finding good long-term strategies in repeated 2x2 matrix games, they behave quite differently.Comment: 16 LaTeX pages, 8 eps figure

    Topology recognition with advice

    Get PDF
    In topology recognition, each node of an anonymous network has to deterministically produce an isomorphic copy of the underlying graph, with all ports correctly marked. This task is usually unfeasible without any a priori information. Such information can be provided to nodes as advice. An oracle knowing the network can give a (possibly different) string of bits to each node, and all nodes must reconstruct the network using this advice, after a given number of rounds of communication. During each round each node can exchange arbitrary messages with all its neighbors and perform arbitrary local computations. The time of completing topology recognition is the number of rounds it takes, and the size of advice is the maximum length of a string given to nodes. We investigate tradeoffs between the time in which topology recognition is accomplished and the minimum size of advice that has to be given to nodes. We provide upper and lower bounds on the minimum size of advice that is sufficient to perform topology recognition in a given time, in the class of all graphs of size nn and diameter DαnD\le \alpha n, for any constant α<1\alpha< 1. In most cases, our bounds are asymptotically tight

    Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

    Get PDF
    In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201
    corecore