58 research outputs found

    Towards Thompson Sampling for Complex Bayesian Reasoning

    Get PDF
    Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

    On how to learn from a stochastic teacher or a stochastic compulsive liar of unknown identity

    No full text
    We consider the problem of a learning mechanism (robot, or algorithm) that learns a parameter while interacting with either a stochastic teacher or a stochastic compulsive liar. The problem is modeled as follows: the learning mechanism is trying to locate an unknown point on a real interval by interacting with a stochastic environment through a series of guesses. For each guess the environment (teacher) essentially informs the mechanism, possibly erroneously, which way it should move to reach the point. Thus, there is a non-zero probability that the feedback from the environment is erroneous. When the probability of correct response is p > 0.5, the environment is said to be Informative, and we have the case of learning from a stochastic teacher.When this probability is p < 0.5 the environment is deemed Deceptive, and is called a stochastic compulsive liar. This paper describes a novel learning strategy by which the unknown parameter can be learned in both environments. To the best of our knowledge, our results are the first reported results which are applicable to the latter scenario. Another main contribution of this paper is that the proposed scheme is shown to operate equally well even when the learning mechanism is unaware whether the environment is Informative or Deceptive. The learning strategy proposed herein, called CPL–ATS, partitions the search interval into three equi-sized sub-intervals, evaluates the location of the unknown point with respect to these sub-intervals using fast-converging _-optimal LRI learning automata, and prunes the search space in each iteration by eliminating at least one partition. The CPL-ATS algorithm is shown to be provably converging to the unknown point to an arbitrary degree of accuracy with probability as close to unity as desired. Comprehensive experimental results confirm the fast and accurate convergence of the search for a wide range of values for the environment’s feedback accuracy parameter p. The above algorithm can be used to learn parameters for non-linear optimization techniques

    An expected utility theory that matches human performance

    Get PDF
    Maximising expected utility has long been accepted as a valid model of rational behaviour, however, it "has limited descriptive accuracy sim- ply because, in practice, people do not always behave in the prescribed way. This is considered evidence that either people are not rational, expected utility is not an appropriate characterisation of rationality, or combination of these. This thesis proposes that a modified form of expected utility hypothesis is normative, suggesting how people ought to behave and descriptive of how they actually do behave, provided that: a) most utility has no meaning unless it is in the presence of potential competitors; b) there is uncertainty in the nature of com- petitors; c) statements of probability are associated with uncertainty; d) utility is marginalised over uncertainty, with framing effects pro- viding constraints; and that e) utility is sensitive to risk, which, taken with reward and uncertainty suggests a three dimensional representa- tion. The first part of the thesis investigates the nature of reward in four experiments and proposes that a three dimensional reward struc- ture (reward, risk, and uncertainty) provides a better description of utility than reward alone. It also proposes that the semantic differ- ential, a well researched psychological instrument, is a representation or description of the reward structure. The second part of the thesis provides a mathematical model of a value function and a probabil- ity weighting function, testing them together against extant problem cases for decision making. It is concluded that utility, perhaps more accurately described as advantage in the present case, when construed as three dimensions and the result of a competition, provides a good explanation of many of the problem cases that are documented in the decision making literature.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Tätigkeitsbericht 2007-2008

    Get PDF

    Secondary Students’ Narratives of Emotion Work While Engaging in Extended/Open Science Inquiry Projects

    Get PDF
    There is growing evidence showing the significance of student emotions in influencing student engagement and achievement. However, naturalistic studies that provide insights into contextual factors that engender students’ emotion experiences and how students manage these experiences to promote the achievement of their academic goals have been sparse. This study investigated secondary students’ emotion work (i.e., attempts to change the degree or quality of emotion experiences) within a distinctive learning environment. The forty-four participants (15-17 years old) were high-achieving students in a selective, science specialist school in the Philippines, who were undertaking two-year open school science inquiry projects with links to real-world research. Students’ emotion work narratives (68 written narratives and 57 narrative interviews) were collected over a ten-month period (which included an eight-month field work). Data analysis focused on situations that engendered emotion work and the strategies students used. School artefacts and students’ narratives were examined for ideas about achievement that were transmitted to and apprehended by students (i.e., achievement discourses), and how these discourses were linked to students’ emotion work. Five thematic groups of situations and four families of emotion work strategies were identified. The emotiveness of the situations was heightened by discourses that associated achievement with students’ social identities and extraordinary performances. Students’ emotion work served the instrumental goals of sustaining engagement in school work, managing the impact of problematic relationships with peers and teachers, and maintaining students’ social identities. Students demonstrated agency in how they harnessed for their emotion work the resources and opportunities afforded by their social networks and by the achievement discourses. This research underscores the role of emotion work in students’ effective functioning in a demanding learning environment with high levels of uncertainty. Its findings suggest the need for more research that explores students’ potential to shape their school experiences through emotion work

    Learning plan networks in conversational video games

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 121-123).We look forward to a future where robots collaborate with humans in the home and workplace, and virtual agents collaborate with humans in games and training simulations. A representation of common ground for everyday scenarios is essential for these agents if they are to be effective collaborators and communicators. Effective collaborators can infer a partner's goals and predict future actions. Effective communicators can infer the meaning of utterances based on semantic context. This thesis introduces a computational cognitive model of common ground called a Plan Network. A Plan Network is a statistical model that provides representations of social roles, object affordances, and expected patterns of behavior and language. I describe a methodology for unsupervised learning of a Plan Network using a multiplayer video game, visualization of this network, and evaluation of the learned model with respect to human judgment of typical behavior. Specifically, I describe learning the Restaurant Plan Network from data collected from over 5,000 players of an online game called The Restaurant Game.by Jeffrey David Orkin.S.M
    • …
    corecore