7,827 research outputs found

    Unsupervised Basis Function Adaptation for Reinforcement Learning

    Full text link
    When using reinforcement learning (RL) algorithms it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on an agent's performance, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is currently interest among researchers in the potential for allowing RL algorithms to adaptively generate (i.e. to learn) approximation architectures. One relatively unexplored method of adapting approximation architectures involves using feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. In this article we will: (a) informally discuss the potential advantages offered by such methods; (b) introduce a new algorithm based on such methods which adapts a state aggregation approximation architecture on-line and is designed for use in conjunction with SARSA; (c) provide theoretical results, in a policy evaluation setting, regarding this particular algorithm's complexity, convergence properties and potential to reduce VF error; and finally (d) test experimentally the extent to which this algorithm can improve performance given a number of different test problems. Taken together our results suggest that our algorithm (and potentially such methods more generally) can provide a versatile and computationally lightweight means of significantly boosting RL performance given suitable conditions which are commonly encountered in practice

    Unsupervised Basis Function Adaptation for Reinforcement Learning

    Full text link
    When using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate approximation architectures. We investigate a method of adapting approximation architectures which uses feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. This method is "unsupervised" in the sense that it makes no direct reference to reward or the VF estimate. We introduce an algorithm based upon this idea which adapts a state aggregation approximation architecture on-line. A common method of scoring a VF estimate is to weight the squared Bellman error of each state-action by the probability of that state-action occurring. Adopting this scoring method, and assuming SS states, we demonstrate theoretically that - provided (1) the number of cells XX in the state aggregation architecture is of order Slog2SlnS\sqrt{S}\log_2{S}\ln{S} or greater, (2) the policy and transition function are close to deterministic, and (3) the prior for the transition function is uniformly distributed - our algorithm, used in conjunction with a suitable RL algorithm, can guarantee a score which is arbitrarily close to zero as SS becomes large. It is able to do this despite having only O(Xlog2S)O(X \log_2S) space complexity and negligible time complexity. The results take advantage of certain properties of the stationary distributions of Markov chains.Comment: Extended abstract submitted (3 March 2017) for 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 201

    An algorithm for finding Hamiltonian Cycles in Cubic Planar Graphs

    Full text link
    We first prove a one-to-one correspondence between finding Hamiltonian cycles in a cubic planar graphs and finding trees with specific properties in dual graphs. Using this information, we construct an exact algorithm for finding Hamiltonian cycles in cubic planar graphs. The worst case time complexity of our algorithm is O(2n)(2^n)

    Distance labellings of Cayley graphs of semigroups

    Full text link
    This paper establishes connections between the structure of a semigroup and the minimum spans of distance labellings of its Cayley graphs. We show that certain general restrictions on the minimum spans are equivalent to the semigroup being combinatorial, and that other restrictions are equivalent to the semigroup being a right zero band. We obtain a description of the structure of all semigroups SS and their subsets CC such that \Cay(S,C) is a disjoint union of complete graphs, and show that this description is also equivalent to several restrictions on the minimum span of \Cay(S,C). We then describe all graphs with minimum spans satisfying the same restrictions, and give examples to show that a fairly straightforward upper bound for the minimum spans of the underlying undirected graphs of Cayley graphs turns out to be sharp even for the class of combinatorial semigroups

    Degree Bounded Bottleneck Spanning Trees in Three Dimensions

    Full text link
    The geometric δ\delta-minimum spanning tree problem (δ\delta-MST) is the problem of finding a minimum spanning tree for a set of points in a normed vector space, such that no vertex in the tree has a degree which exceeds δ\delta, and the sum of the lengths of the edges in the tree is minimum. The similarly defined geometric δ\delta-minimum bottleneck spanning tree problem (δ\delta-MBST), is the problem of finding a degree bounded spanning tree such that the length of the longest edge is minimum. For point sets that lie in the Euclidean plane, both of these problems have been shown to be NP-hard for certain specific values of δ\delta. In this paper, we investigate the δ\delta-MBST problem in 33-dimensional Euclidean space and 33-dimensional rectilinear space. We show that the problems are NP-hard for certain values of δ\delta, and we provide inapproximability results for these cases. We also describe new approximation algorithms for solving these 33-dimensional variants, and then analyse their worst-case performance.Comment: 35 pages, 22 figure

    A Flow-dependent Quadratic Steiner Tree Problem in the Euclidean Plane

    Full text link
    We introduce a flow-dependent version of the quadratic Steiner tree problem in the plane. An instance of the problem on a set of embedded sources and a sink asks for a directed tree TT spanning these nodes and a bounded number of Steiner points, such that eE(T)f(e)e2\displaystyle\sum_{e \in E(T)}f(e)|e|^2 is a minimum, where f(e)f(e) is the flow on edge ee. The edges are uncapacitated and the flows are determined additively, i.e., the flow on an edge leaving a node uu will be the sum of the flows on all edges entering uu. Our motivation for studying this problem is its utility as a model for relay augmentation of wireless sensor networks. In these scenarios one seeks to optimise power consumption -- which is predominantly due to communication and, in free space, is proportional to the square of transmission distance -- in the network by introducing additional relays. We prove several geometric and combinatorial results on the structure of optimal and locally optimal solution-trees (under various strategies for bounding the number of Steiner points) and describe a geometric linear-time algorithm for constructing such trees with known topologies

    An exact algorithm for the bottleneck 2-connected kk-Steiner network problem in LpL_p planes

    Full text link
    We present the first exact polynomial time algorithm for constructing optimal geometric bottleneck 2-connected Steiner networks containing at most kk Steiner points, where k>2k>2 is a constant. Given a set of nn vertices embedded in an LpL_p plane, the objective of the problem is to find a 2-connected network, spanning the given vertices and at most kk additional vertices, such that the length of the longest edge is minimised. In contrast to the discrete version of this problem the additional vertices may be located anywhere in the plane. The problem is motivated by the modelling of relay-augmentation for the optimisation of energy consumption in wireless ad hoc networks. Our algorithm employs Voronoi diagrams and properties of block-cut-vertex decompositions of graphs to find an optimal solution in O(nklog5k2n)O(n^k\log^{\frac{5k}{2}}n) steps when 1<p<1<p<\infty and in O(n2log7k2+1n)O(n^2\log^{\frac{7k}{2}+1}n) steps when p{1,}p\in\{1,\infty\}

    Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges

    Full text link
    Issues regarding explainable AI involve four components: users, laws & regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements from the General Data Protection Regulation are analyzed in the context of Deep Neural Networks (DNNs), a taxonomy for the classification of existing explanation methods is introduced, and finally, the various classes of explanation methods are analyzed to verify if user concerns are justified. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output. However, it is noted that explanation methods or interfaces for lay users are missing and we speculate which criteria these methods / interfaces should satisfy. Finally it is noted that two important concerns are difficult to address with explanation methods: the concern about bias in datasets that leads to biased DNNs, as well as the suspicion about unfair outcomes.Comment: 14 pages, 1 figure, This article will appear as a chapter in Explainable and Interpretable Models in Computer Vision and Machine Learning Springer series on Challenges in Machine Learnin

    A geometric characterisation of the quadratic min-power centre

    Full text link
    For a given set of nodes in the plane the min-power centre is a point such that the cost of the star centred at this point and spanning all nodes is minimised. The cost of the star is defined as the sum of the costs of its nodes, where the cost of a node is an increasing function of the length of its longest incident edge. The min-power centre problem provides a model for optimally locating a cluster-head amongst a set of radio transmitters, however, the problem can also be formulated within a bicriteria location model involving the 1-centre and a generalized Fermat-Weber point, making it suitable for a variety of facility location problems. We use farthest point Voronoi diagrams and Delaunay triangulations to provide a complete geometric description of the min-power centre of a finite set of nodes in the Euclidean plane when cost is a quadratic function. This leads to a new linear-time algorithm for its construction when the convex hull of the nodes is given. We also provide an upper bound for the performance of the centroid as an approximation to the quadratic min-power centre. Finally, we briefly describe the relationship between solutions under quadratic cost and solutions under more general cost functions

    REVIEW ARTICLE Ras Proteins: Recent Advances and New Functions

    Get PDF
    Updated information and services can be found at
    corecore