The General Video Game AI competitions have been the testing ground for
several techniques for game playing, such as evolutionary computation
techniques, tree search algorithms, hyper heuristic based or knowledge based
algorithms. So far the metrics used to evaluate the performance of agents have
been win ratio, game score and length of games. In this paper we provide a
wider set of metrics and a comparison method for evaluating and comparing
agents. The metrics and the comparison method give shallow introspection into
the agent's decision making process and they can be applied to any agent
regardless of its algorithmic nature. In this work, the metrics and the
comparison method are used to measure the impact of the terms that compose a
tree policy of an MCTS based agent, comparing with several baseline agents. The
results clearly show how promising such general approach is and how it can be
useful to understand the behaviour of an AI agent, in particular, how the
comparison with baseline agents can help understanding the shape of the agent
decision landscape. The presented metrics and comparison method represent a
step toward to more descriptive ways of logging and analysing agent's
behaviours