55 research outputs found

    Towards Feature Selection In Actor-Critic Algorithms

    Get PDF
    Choosing features for the critic in actor-critic algorithms with function approximation is known to be a challenge. Too few critic features can lead to degeneracy of the actor gradient, and too many features may lead to slower convergence of the learner. In this paper, we show that a well-studied class of actor policies satisfy the known requirements for convergence when the actor features are selected carefully. We demonstrate that two popular representations for value methods - the barycentric interpolators and the graph Laplacian proto-value functions - can be used to represent the actor in order to satisfy these conditions. A consequence of this work is a generalization of the proto-value function methods to the continuous action actor-critic domain. Finally, we analyze the performance of this approach using a simulation of a torque-limited inverted pendulum

    Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

    Get PDF
    In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linearchain CRFs, achieving comparable performance using only half the training data

    Online Tool Selection with Learned Grasp Prediction Models

    Full text link
    Deep learning-based grasp prediction models have become an industry standard for robotic bin-picking systems. To maximize pick success, production environments are often equipped with several end-effector tools that can be swapped on-the-fly, based on the target object. Tool-change, however, takes time. Choosing the order of grasps to perform, and corresponding tool-change actions, can improve system throughput; this is the topic of our work. The main challenge in planning tool change is uncertainty - we typically cannot see objects in the bin that are currently occluded. Inspired by queuing and admission control problems, we model the problem as a Markov Decision Process (MDP), where the goal is to maximize expected throughput, and we pursue an approximate solution based on model predictive control, where at each time step we plan based only on the currently visible objects. Special to our method is the idea of void zones, which are geometrical boundaries in which an unknown object will be present, and therefore cannot be accounted for during planning. Our planning problem can be solved using integer linear programming (ILP). However, we find that an approximate solution based on sparse tree search yields near optimal performance at a fraction of the time. Another question that we explore is how to measure the performance of tool-change planning: we find that throughput alone can fail to capture delicate and smooth behavior, and propose a principled alternative. Finally, we demonstrate our algorithms on both synthetic and real world bin picking tasks.Comment: 14 pages (including the cover page), 5 Figures, Technical Report, OSARO In

    Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences

    Get PDF
    Conditional random fields (CRFs) for sequence modeling have several advantages over joint models such as HMMs, including the ability to relax strong independence assumptions made in those models, and the ability to incorporate arbitrary overlapping features. Previous work has focused on linear-chain CRFs, which correspond to finite-state machines, and have efficient exact inference algorithms. Often, however, we wish to label sequence data in multiple interacting ways---for example, performing part-of-speech tagging and noun phrase segmentation simultaneously, increasing joint accuracy by sharing information between them. We presen

    Evaluating Main Parameters Effects of Near-Field Earthquakes on the Behavior of Concrete Structures with Moment Frame System

    Full text link
    Amplitude and frequency content are two important features of the earthquake which are different for near and far-fault earthquakes and in most of the standards, effects of the near -field earthquakes in loading are not considered. So study and comparison of these effects on structures is necessary. In this paper, structures operation against near and far fault earthquakes for two near sites and two far sites have been investigated. For this purpose, in order to achieve operation point of a six stairs structural model with mean lateral bending frame resistant system, from special plan spectrums of two different sites, near and far fault, which are obtained from seismic hazard analysis is used. Evaluation of effects due to near and far fault earthquakes based on the Iran’s standard 2800 ranges on operation point and also comparison of operation effects of near and far fault spectrums with Iran’s standard are results of this research work. In continues, after presentation of obtained results from time history analyses, some suggestions have been proposed for design correction based on the regulations in near -field earthquakes

    Concurrent decision making in Markov decision processes

    No full text
    This dissertation investigates concurrent decision making and coordination in systems that can simultaneously execute multiple actions to perform tasks more efficiently. Concurrent decision-making is a fundamental problem in many areas of robotics, control, and computer science. In the field of Artificial Intelligence in particular, this problem is recognized as a formidable challenge. By concurrent decision making we refer to a class of problems that require agents to accomplish long-term goals by concurrently executing multiple activities. In general, the problem is difficult to solve as it requires learning and planning with a combinatorial set of interacting concurrent activities with uncertain outcomes that compete for limited resources in the system. The dissertation presents a general framework for modeling the concurrent decision making problem based on semi-Markov decision processes (SMDPs). Our approach is based on a centralized control formalism, where we assume a central control mechanism initiates, executes and monitors concurrent activities. This view also captures the type of concurrency that exists in single agent domains, where a single agent is capable of performing multiple activities simultaneously by exploiting the degrees of freedom (DOF) in the system. We present a set of coordination mechanisms employed by our model for monitoring the execution and termination of concurrent activities. Such coordination mechanisms incorporate various natural activity completion mechanisms based on the individual termination of each activity. We provide theoretical results that assert the correctness of the model semantics which allows us to apply standard SMDP learning and planning techniques for solving the concurrent decision making problem. SMDP solution methods do not scale to concurrent decision making systems with large degrees of freedom. This problem is a classic example of the curse of dimensionality in the action space, where the size of the set of concurrent activities exponentially grows as the system admits more degrees of freedom. To alleviate this problem, we develop a novel decision theoretic framework motivated by the coarticulation phenomenon investigated in speech and motor control research. The key idea in this approach is based on the fact that in many concurrent decision making problems, the overall objective of the problem can be viewed as concurrent optimization of a set of interacting and possibly simpler subgoals of the problem for which the agent has gained the necessary skills to achieve them. We show that by applying coarticulation to systems with excess degrees of freedom, concurrency is naturally generated. We present a set of theoretical results that characterizes the efficiency of the concurrent decision making based on the coarticulation framework when compared to the case in which the agent is allowed to only execute activities sequentially (i.e., no coarticulation). (Abstract shortened by UMI.
    • …
    corecore