971 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    ATTac-2000: An Adaptive Autonomous Bidding Agent

    Full text link
    The First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding strategy that includes several elements of adaptivity. In addition to the success at the competition, isolated empirical results are presented indicating the robustness and effectiveness of ATTac-2000's adaptive strategy

    Contingent planning under uncertainty via stochastic satisfiability

    Get PDF
    We describe a new planning technique that efficiently solves probabilistic propositional contingent planning problems by converting them into instances of stochastic satisfiability (SSAT) and solving these problems instead. We make fundamental contributions in two areas: the solution of SSAT problems and the solution of stochastic planning problems. This is the first work extending the planning-as-satisfiability paradigm to stochastic domains. Our planner, ZANDER, can solve arbitrary, goal-oriented, finite-horizon partially observable Markov decision processes (POMDPs). An empirical study comparing ZANDER to seven other leading planners shows that its performance is competitive on a range of problems. © 2003 Elsevier Science B.V. All rights reserved

    In Pursuit of Patent Quality (And Reflection of Reification)

    Get PDF
    Non

    Hall-Effect for Neutral Atoms

    Full text link
    It is shown that polarizable neutral systems can drift in crossed magnetic and electric fileds. The drift velocity is perpendicular to both fields, but contrary to the drif t velocity of a charged particle, it exists only, if fields vary in space or in time. We develop an adiabatic theory of this phenomenon and analyze conditions of its experimental observation. The most proper objects for the observation of this effect are Rydberg atoms. It can be applied for the separation of excited atoms.Comment: RevTex, 4 pages; to be published in Pis'ma v ZhET

    False-Name Manipulation in Weighted Voting Games is Hard for Probabilistic Polynomial Time

    Full text link
    False-name manipulation refers to the question of whether a player in a weighted voting game can increase her power by splitting into several players and distributing her weight among these false identities. Analogously to this splitting problem, the beneficial merging problem asks whether a coalition of players can increase their power in a weighted voting game by merging their weights. Aziz et al. [ABEP11] analyze the problem of whether merging or splitting players in weighted voting games is beneficial in terms of the Shapley-Shubik and the normalized Banzhaf index, and so do Rey and Rothe [RR10] for the probabilistic Banzhaf index. All these results provide merely NP-hardness lower bounds for these problems, leaving the question about their exact complexity open. For the Shapley--Shubik and the probabilistic Banzhaf index, we raise these lower bounds to hardness for PP, "probabilistic polynomial time", and provide matching upper bounds for beneficial merging and, whenever the number of false identities is fixed, also for beneficial splitting, thus resolving previous conjectures in the affirmative. It follows from our results that beneficial merging and splitting for these two power indices cannot be solved in NP, unless the polynomial hierarchy collapses, which is considered highly unlikely

    Learning Mazes with Aliasing States: An LCS Algorithm with Associative Perception

    Get PDF
    Learning classifier systems (LCSs) belong to a class of algorithms based on the principle of self-organization and have frequently been applied to the task of solving mazes, an important type of reinforcement learning (RL) problem. Maze problems represent a simplified virtual model of real environments that can be used for developing core algorithms of many real-world applications related to the problem of navigation. However, the best achievements of LCSs in maze problems are still mostly bounded to non-aliasing environments, while LCS complexity seems to obstruct a proper analysis of the reasons of failure. We construct a new LCS agent that has a simpler and more transparent performance mechanism, but that can still solve mazes better than existing algorithms. We use the structure of a predictive LCS model, strip out the evolutionary mechanism, simplify the reinforcement learning procedure and equip the agent with the ability of associative perception, adopted from psychology. To improve our understanding of the nature and structure of maze environments, we analyze mazes used in research for the last two decades, introduce a set of maze complexity characteristics, and develop a set of new maze environments. We then run our new LCS with associative perception through the old and new aliasing mazes, which represent partially observable Markov decision problems (POMDP) and demonstrate that it performs at least as well as, and in some cases better than, other published systems

    Next nearest neighbour Ising models on random graphs

    Full text link
    This paper develops results for the next nearest neighbour Ising model on random graphs. Besides being an essential ingredient in classic models for frustrated systems, second neighbour interactions interactions arise naturally in several applications such as the colour diversity problem and graphical games. We demonstrate ensembles of random graphs, including regular connectivity graphs, that have a periodic variation of free energy, with either the ratio of nearest to next nearest couplings, or the mean number of nearest neighbours. When the coupling ratio is integer paramagnetic phases can be found at zero temperature. This is shown to be related to the locked or unlocked nature of the interactions. For anti-ferromagnetic couplings, spin glass phases are demonstrated at low temperature. The interaction structure is formulated as a factor graph, the solution on a tree is developed. The replica symmetric and energetic one-step replica symmetry breaking solution is developed using the cavity method. We calculate within these frameworks the phase diagram and demonstrate the existence of dynamical transitions at zero temperature for cases of anti-ferromagnetic coupling on regular and inhomogeneous random graphs.Comment: 55 pages, 15 figures, version 2 with minor revisions, to be published J. Stat. Mec
    corecore