9,335 research outputs found

    Extending Feynman's Formalisms for Modelling Human Joint Action Coordination

    Full text link
    The recently developed Life-Space-Foam approach to goal-directed human action deals with individual actor dynamics. This paper applies the model to characterize the dynamics of co-action by two or more actors. This dynamics is modelled by: (i) a two-term joint action (including cognitive/motivatonal potential and kinetic energy), and (ii) its associated adaptive path integral, representing an infinite--dimensional neural network. Its feedback adaptation loop has been derived from Bernstein's concepts of sensory corrections loop in human motor control and Brooks' subsumption architectures in robotics. Potential applications of the proposed model in human--robot interaction research are discussed. Keywords: Psycho--physics, human joint action, path integralsComment: 6 pages, Late

    Modelling the hepatitis B vaccination programme in prisons

    Get PDF
    A vaccination programme offering hepatitis B (HBV) vaccine at reception into prison has been introduced into selected prisons in England and Wales. Over the coming years it is anticipated this vaccination programme will be extended. A model has been developed to assess the potential impact of the programme on the vaccination coverage of prisoners, ex-prisoners, and injecting drug users (IDUs). Under a range of coverage scenarios, the model predicts the change over time in the vaccination status of new entrants to prison, current prisoners and IDUs in the community. The model predicts that at baseline in 2012 57% of the IDU population will be vaccinated with up to 72% being vaccinated depending on the vaccination scenario implemented. These results are sensitive to the size of the IDU population in England and Wales and the average time served by an IDU during each prison visit. IDUs that do not receive HBV vaccine in the community are at increased risk from HBV infection. The HBV vaccination programme in prisons is an effective way of vaccinating this hard-to-reach population although vaccination coverage on prison reception must be increased to achieve this

    Prediction with Expert Advice under Discounted Loss

    Full text link
    We study prediction with expert advice in the setting where the losses are accumulated with some discounting---the impact of old losses may gradually vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm for Regression to this case, propose a suitable new variant of exponential weights algorithm, and prove respective loss bounds.Comment: 26 pages; expanded (2 remarks -> theorems), some misprints correcte

    Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees

    Full text link
    Deep Reinforcement Learning (DRL) has achieved impressive success in many applications. A key component of many DRL models is a neural network representing a Q function, to estimate the expected cumulative reward following a state-action pair. The Q function neural network contains a lot of implicit knowledge about the RL problems, but often remains unexamined and uninterpreted. To our knowledge, this work develops the first mimic learning framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to approximate neural network predictions. An LMUT is learned using a novel on-line algorithm that is well-suited for an active play setting, where the mimic learner observes an ongoing interaction between the neural net and the environment. Empirical evaluation shows that an LMUT mimics a Q function substantially better than five baseline methods. The transparent tree structure of an LMUT facilitates understanding the network's learned knowledge by analyzing feature influence, extracting rules, and highlighting the super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201

    Self-Modification of Policy and Utility Function in Rational Agents

    Full text link
    Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

    Information theoretic approach to interactive learning

    Full text link
    The principles of statistical mechanics and information theory play an important role in learning and have inspired both theory and the design of numerous machine learning algorithms. The new aspect in this paper is a focus on integrating feedback from the learner. A quantitative approach to interactive learning and adaptive behavior is proposed, integrating model- and decision-making into one theoretical framework. This paper follows simple principles by requiring that the observer's world model and action policy should result in maximal predictive power at minimal complexity. Classes of optimal action policies and of optimal models are derived from an objective function that reflects this trade-off between prediction and complexity. The resulting optimal models then summarize, at different levels of abstraction, the process's causal organization in the presence of the learner's actions. A fundamental consequence of the proposed principle is that the learner's optimal action policies balance exploration and control as an emerging property. Interestingly, the explorative component is present in the absence of policy randomness, i.e. in the optimal deterministic behavior. This is a direct result of requiring maximal predictive power in the presence of feedback.Comment: 6 page

    Power Law Scaling for a System of Interacting Units with Complex Internal Structure

    Full text link
    We study the dynamics of a system composed of interacting units each with a complex internal structure comprising many subunits. We consider the case in which each subunit grows in a multiplicative manner. We propose a model for such systems in which the interaction among the units is treated in a mean field approximation and the interaction among subunits is nonlinear. To test the model, we identify a large data base spanning 20 years, and find that the model correctly predicts a variety of empirical results.Comment: 4 pages with 4 postscript figures (uses Revtex 3.1, Latex2e, multicol.sty, epsf.sty and rotate.sty). Submitted to PR
    • ā€¦
    corecore