5 research outputs found

    A Study of Cooperative Mechanisms for Faster Reinforcement Learning

    No full text
    Using pure reinforcement learning to solve a multi-stage decision problem is computationally equivalent to performing a search over the entire state space. When a priori knowledge is not available for guidance, search can be excessive - limiting the applicability of reinforcement learning for real-world tasks. Cooperative mechanisms help reduce search by providing the learner with shorter latency feedback and auxiliary sources of trial-and-error experience. These mechanisms are based on the observation that in nature, intelligent agents exist in a cooperative social environment that helps structure and guide learning. Within this context, learning involves information transfer as much as it does trial-and-error discovery. Two general cooperative mechanisms are described: Learning-with-an-ExternalCritic (or LEC) and Learning- By-Watching (or LBW). Specific algorithms for each are studied empirically in a simple grid-world and shown to improve significantly agent adaptability. Analytical results for both, under various learning conditions, are also provided. These results indicate that while an unbiased search can be expected to require time exponential in the size of the state space, the LEC and LBW algorithms require at most time linear in the size of the state space and under appropriate conditions are independent of the state space size and require time proportional to the length of the optimal solution path. The issue of behavior interpretation is also discussed

    Thesis Proposal: Scaling Reinforcement Learning Systems

    No full text
    Thesis proposal.Reinforcement learning systems are interesting because they meet three major criteria for animate control, namely: competence, responsiveness, and autonomous adaptability. Unfortunately, these systems have not been scaled to complex task domains. For my thesis I propose to study three separate problems that arise when scaling reinforcement learning systems to larger task domains. These are: the propagation problem, the transfer problem, and the attention problem. The propagation problem arises when the number of states in the problem domain is scaled and the distance the system must go for reinforcement is increased. The transfer problem occurs when reinforcement learning systems are applied to problem solving tasks where its desirable to transfer knowledge useful for solving one problem to another. The attention problem arises when a system with a fixed length input vector is applied to a task domains containing an arbitrary number of objects. Each of these problems are discussed along with possible approaches for their solution. A schedule for performing the research is also given

    Learning to Perceive and Act

    No full text
    This paper considers adaptive control architectures that integrate active sensory-motor systems with decision systems based on reinforcement learning. One unavoidable consequence of active perception is that the agent's internal representation often confounds external world states. We call this phenomenon perceptual aliasing and show that it destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy. We then describe a new decision system that overcomes these difficulties for a restricted class of decision problems. The system incorporates a perceptual subcycle within the overall decision cycle and uses a modified learning algorithm to suppress the effects of perceptual aliasing. The result is a control architecture that learns not only how to solve a task but also where to focus its attention in order to collect necessary sensory information

    Reinforcement learning for the adaptive control of perception and action

    No full text
    Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1992.This dissertation applies reinforcement learning to the adaptive control of active sensory-motor systems. Active sensory-motor systems, in addition to providing for overt action, also support active, selective sensing of the environment. The principal advantage of this active approach to perception is that the agent's internal representation can be made highly task specific - thus, avoiding wasteful sensory processing and the representation of irrelevant information. One unavoidable consequence of active perception is that improper control can lead to internal states that confound functionally distinct states in the external world. This phenomenon, called perceptual aliasing, is shown to destabilize existing reinforcement learning algorithms with respect to optimal control. To overcome these difficulties, an approach to adaptive control, called the Consistent Representation (CR) method, is developed. This method is used to construct systems that learn not only the overt actions needed to solve a task, but also where to focus their attention in order to collect necessary sensory information. The principle of the CR-method is to separate control into two stages: an identification stage, followed by an overt stage. The identification stage generates the task-specific internal representation that is used by the overt control stage. Adaptive identification is accomplished by a technique that involves the detection and suppression of perceptually aliased internal states. Q-learning is used for adaptive overt control. The technique is then extended to include two cooperative learning mechanisms, called Learning with an External Critic (LEC) and Learning By Watching (LBW), respectively, which significantly improve learning. Cooperative mechanisms exploit the presence of helpful agents in the environment to supply auxilliary sources of trial-and-error experience and to decrease the latency between the execution and evaluation of an action

    The Rochester Robot

    No full text
    The Rochester Robot is a unique design that was built to study the use of real-time vision in cognition and movement. The major feature of the robot is the robot head. The head consists of binocular cameras that can be moved at over 400 degrees per second. During rapid movements visual data can be analyzed with a pipeline processor and used to control the motion of a body. The body is a PUMA 761 six degree-of-freedom arm which has a two meter radius workspace and a top speed of about 100 cm/second. These features combine to give the robot the capability of reacting to features in its environment in complicated ways in real time
    corecore