80 research outputs found

    Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

    Full text link
    Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.Comment: 16 pages, 3 figure

    Information theoretic approach to interactive learning

    Full text link
    The principles of statistical mechanics and information theory play an important role in learning and have inspired both theory and the design of numerous machine learning algorithms. The new aspect in this paper is a focus on integrating feedback from the learner. A quantitative approach to interactive learning and adaptive behavior is proposed, integrating model- and decision-making into one theoretical framework. This paper follows simple principles by requiring that the observer's world model and action policy should result in maximal predictive power at minimal complexity. Classes of optimal action policies and of optimal models are derived from an objective function that reflects this trade-off between prediction and complexity. The resulting optimal models then summarize, at different levels of abstraction, the process's causal organization in the presence of the learner's actions. A fundamental consequence of the proposed principle is that the learner's optimal action policies balance exploration and control as an emerging property. Interestingly, the explorative component is present in the absence of policy randomness, i.e. in the optimal deterministic behavior. This is a direct result of requiring maximal predictive power in the presence of feedback.Comment: 6 page

    Empowerment for Continuous Agent-Environment Systems

    Full text link
    This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning

    Developing a European Psychotherapy Consortium (EPoC):Towards adopting a single-item self-report outcome measure across European countries

    Get PDF
    Background: Complementing the development of evidence-based psychological therapies, practicebased evidence has developed from patient samples collected in routine care, addressing questions relevant to patients and practitioners, and thereby expanding our knowledge of psychological therapies and their impact. Implementation of assessments in routine care allows for timely clinical decision support and the collection of multiple practice-based data sets by addressing the needs of patients and clinicians (e.g., routine outcome monitoring) and the needs of researchers (e.g., identifying the impact of therapist variables on outcomes). Method: In this article we describe an initiative developed in Europe, through the European Chapter of the Society for Psychotherapy Research, aimed at creating a consortium that has the potential for collecting data on tens of thousands of patients per year. Results: A survey identified one of the main problems in the development of a common data set to be the heterogeneity of measures used by members (e.g., 87 different pre-post outcomes). We report on the results of the survey and the initial stage of identifying a single-item – the Emotional and Psychological Outcome (EPO-1) – measure and the process of its translation into multiple European languages. Conclusions: We conclude this first stage of the overall project by discussing the future potential of the Consortium in relation to the development of procedures that allow crosswalks of outcome measures and the creation of a task force that may be consulted when new data sets are collected, aiming for new common measures to be implemented and shared.<br/

    RoboCup 2D Soccer Simulation League: Evaluation Challenges

    Full text link
    We summarise the results of RoboCup 2D Soccer Simulation League in 2016 (Leipzig), including the main competition and the evaluation round. The evaluation round held in Leipzig confirmed the strength of RoboCup-2015 champion (WrightEagle, i.e. WE2015) in the League, with only eventual finalists of 2016 competition capable of defeating WE2015. An extended, post-Leipzig, round-robin tournament which included the top 8 teams of 2016, as well as WE2015, with over 1000 games played for each pair, placed WE2015 third behind the champion team (Gliders2016) and the runner-up (HELIOS2016). This establishes WE2015 as a stable benchmark for the 2D Simulation League. We then contrast two ranking methods and suggest two options for future evaluation challenges. The first one, "The Champions Simulation League", is proposed to include 6 previous champions, directly competing against each other in a round-robin tournament, with the view to systematically trace the advancements in the League. The second proposal, "The Global Challenge", is aimed to increase the realism of the environmental conditions during the simulated games, by simulating specific features of different participating countries.Comment: 12 pages, RoboCup-2017, Nagoya, Japan, July 201

    Discrete profile comparison using information bottleneck

    Get PDF
    Sequence homologs are an important source of information about proteins. Amino acid profiles, representing the position-specific mutation probabilities found in profiles, are a richer encoding of biological sequences than the individual sequences themselves. However, profile comparisons are an order of magnitude slower than sequence comparisons, making profiles impractical for large datasets. Also, because they are such a rich representation, profiles are difficult to visualize. To address these problems, we describe a method to map probabilistic profiles to a discrete alphabet while preserving most of the information in the profiles. We find an informationally optimal discretization using the Information Bottleneck approach (IB). We observe that an 80-character IB alphabet captures nearly 90% of the amino acid occurrence information found in profiles, compared to the consensus sequence's 78%. Distant homolog search with IB sequences is 88% as sensitive as with profiles compared to 61% with consensus sequences (AUC scores 0.73, 0.83, and 0.51, respectively), but like simple sequence comparison, is 30 times faster. Discrete IB encoding can therefore expand the range of sequence problems to which profile information can be applied to include batch queries over large databases like SwissProt, which were previously computationally infeasible

    When Will Adolescents Tell Someone About Dating Violence Victimization?

    Full text link
    This study examined factors that influence help-seeking among a diverse sample of adolescents who experienced dating violence. A sample of 57 high school students in an urban community reported on the prevalence and characteristics of dating violence in their relationships. Someone observing a dating violence incident and a survivor’s attaching an emotional meaning to the event significantly influenced adolescents to talk to someone. When dating violence occurred in isolation, survivors were more likely to receive no support from others in the aftermath of the incident. Differences between boys’ and girls’ help-seeking and implications for dating violence intervention and prevention programming are discussed.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90887/1/Black-Tolman-Callahan-Saunders- Weisz- 2008-When will adolescents tell someone about dating violence VAW.pd
    • 

    corecore