152,136 research outputs found
Deterministic Graph Exploration with Advice
We consider the task of graph exploration. An -node graph has unlabeled
nodes, and all ports at any node of degree are arbitrarily numbered
. A mobile agent has to visit all nodes and stop. The exploration
time is the number of edge traversals. We consider the problem of how much
knowledge the agent has to have a priori, in order to explore the graph in a
given time, using a deterministic algorithm. This a priori information (advice)
is provided to the agent by an oracle, in the form of a binary string, whose
length is called the size of advice. We consider two types of oracles. The
instance oracle knows the entire instance of the exploration problem, i.e., the
port-numbered map of the graph and the starting node of the agent in this map.
The map oracle knows the port-numbered map of the graph but does not know the
starting node of the agent.
We first consider exploration in polynomial time, and determine the exact
minimum size of advice to achieve it. This size is ,
for both types of oracles.
When advice is large, there are two natural time thresholds:
for a map oracle, and for an instance oracle, that can be achieved
with sufficiently large advice. We show that, with a map oracle, time
cannot be improved in general, regardless of the size of advice.
We also show that the smallest size of advice to achieve this time is larger
than , for any .
For an instance oracle, advice of size is enough to achieve time
. We show that, with any advice of size , the time of
exploration must be at least , for any , and with any
advice of size , the time must be .
We also investigate minimum advice sufficient for fast exploration of
hamiltonian graphs
Universal Learning of Repeated Matrix Games
We study and compare the learning dynamics of two universal learning
algorithms, one based on Bayesian learning and the other on prediction with
expert advice. Both approaches have strong asymptotic performance guarantees.
When confronted with the task of finding good long-term strategies in repeated
2x2 matrix games, they behave quite differently.Comment: 16 LaTeX pages, 8 eps figure
Topology recognition with advice
In topology recognition, each node of an anonymous network has to
deterministically produce an isomorphic copy of the underlying graph, with all
ports correctly marked. This task is usually unfeasible without any a priori
information. Such information can be provided to nodes as advice. An oracle
knowing the network can give a (possibly different) string of bits to each
node, and all nodes must reconstruct the network using this advice, after a
given number of rounds of communication. During each round each node can
exchange arbitrary messages with all its neighbors and perform arbitrary local
computations. The time of completing topology recognition is the number of
rounds it takes, and the size of advice is the maximum length of a string given
to nodes.
We investigate tradeoffs between the time in which topology recognition is
accomplished and the minimum size of advice that has to be given to nodes. We
provide upper and lower bounds on the minimum size of advice that is sufficient
to perform topology recognition in a given time, in the class of all graphs of
size and diameter , for any constant . In most
cases, our bounds are asymptotically tight
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
In recent years, state-of-the-art game-playing agents often involve policies
that are trained in self-playing processes where Monte Carlo tree search (MCTS)
algorithms and trained policies iteratively improve each other. The strongest
results have been obtained when policies are trained to mimic the search
behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design,
includes an element of exploration, policies trained in this manner are also
likely to exhibit a similar extent of exploration. In this paper, we are
interested in learning policies for a project with future goals including the
extraction of interpretable strategies, rather than state-of-the-art
game-playing performance. For these goals, we argue that such an extent of
exploration is undesirable, and we propose a novel objective function for
training policies that are not exploratory. We derive a policy gradient
expression for maximising this objective function, which can be estimated using
MCTS value estimates, rather than MCTS visit counts. We empirically evaluate
various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201
- …