4,286 research outputs found

    Discovering Blind Spots in Reinforcement Learning

    Full text link
    Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult to discover because the agent cannot predict them a priori. We propose using oracle feedback to learn a predictive model of these blind spots to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: The agent does not have the appropriate features to represent the true state of the world and thus cannot distinguish among numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. We learn models to predict blind spots in unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. The models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach on two domains and show that it achieves higher predictive performance than baseline methods, and that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how they influence the discovery of blind spots.Comment: To appear at AAMAS 201

    Shifting the ways prospective teachers frame and notice student mathematical thinking: from deficits to strengths

    Get PDF
    Noticing the strengths in students’ mathematical thinking is a critical skill that teachers need to develop, but it can be challenging due to the prevalence of deficit-based thinking in mathematics education. To address this challenge, a teacher education course was designed to encourage prospective teachers to engage in critical reflection on their own and others’ framings of students’ thinking and shift their focus towards noticing students’ strengths. The study analyzed written responses from the prospective teachers, collected at the beginning and end of the course, to investigate their framing and noticing of students’ mathematical thinking. The analysis focused on the aspects of students’ thinking that the prospective teachers paid attention to, the stances they took when interpreting students’ thinking, and the instructional moves they proposed in response to their thinking. Furthermore, the study established a spectrum of deficit-based and strength-based framings on students’ mathematical thinking. This spectrum allowed for the identification of each participant’s written noticing responses within a range of possibilities, contributing to a more nuanced understanding of the changes in teachers’ framing and noticing of students’ thinking over time

    Shifting the ways prospective teachers frame and notice student mathematical thinking : From deficits to strengths

    Get PDF
    Noticing the strengths in students’ mathematical thinking is a critical skill that teachers need to develop, but it can be challenging due to the prevalence of deficit-based thinking in mathematics education. To address this challenge, a teacher education course was designed to encourage prospective teachers to engage in critical reflection on their own and others’ framings of students’ thinking and shift their focus towards noticing students’ strengths. The study analyzed written responses from the prospective teachers, collected at the beginning and end of the course, to investigate their framing and noticing of students’ mathematical thinking. The analysis focused on the aspects of students’ thinking that the prospective teachers paid attention to, the stances they took when interpreting students’ thinking, and the instructional moves they proposed in response to their thinking. Furthermore, the study established a spectrum of deficit-based and strength-based framings on students’ mathematical thinking. This spectrum allowed for the identification of each participant’s written noticing responses within a range of possibilities, contributing to a more nuanced understanding of the changes in teachers’ framing and noticing of students’ thinking over time

    Preparing Undergraduate Music Majors to Teach Beginning Instrumentalists: The Effects of Self-Evaluation, Teacher Observation, and Performance-Oriented Instructional Approaches on Teacher Behaviors and Pupil Responses.

    Get PDF
    The purpose of this study was to investigate the effects of three approaches to training preservice instrumental music teachers (N = 22) for initial teaching experiences involving beginning instrumentalists ( N = 22). The three approaches---one involving intensive self-evaluation activities, a second focusing on observation of experienced instrumental music teachers, and a third evidencing a performance orientation---were administered as a four-week treatment phase in an undergraduate brass techniques course. Primarily, this study was designed to answer the question: Did instructional approach differentially affect teacher behavior across two private lessons? Teacher (subject) and pupil behaviors were documented and categorized according to various aspects of subject/pupil activity, subject verbalizations, successful/unsuccessful performance trials, and subjects\u27 secondary instrument (trumpet or trombone) performance competency. In addition, subject and pupil post-treatment attitudes were assessed. Following the treatment phase, subjects taught two lessons to beginning band pupils. Forty-four lessons (totaling more than 1,000 minutes and averaging roughly 24 minutes) were videotaped and analyzed. Certain lesson activities were timed using the behavioral observation computer application, SCRIBE. Results indicated that the self-evaluation group engaged their pupils in performance activity 44.76% of the time, which was significantly more than the teacher observation and performance orientation groups. Using verbatim transcripts of lessons, subject verbalizations were labeled as academic information, direction-giving, information-gathering, or off-task remarks. Pupil responses were categorized as successful, unsuccessful, or no response. Overall subjects used academic verbalizations, three times more than they used direction verbalizations. When pupil responses were preceded by subject verbalizations that were, subject matter rich, pupils were more likely to respond successfully than when verbalizations were subject matter neutral, as in direction-giving (p \u3c .0001). There were no treatment group differences with regard to subject verbalization and pupil responses. Subjects\u27 ability to perform on the secondary instruments studied during treatment was determined by three independent judges. Results indicated no significant differences among treatment groups or between major instruments (brass and non-brass). Further, regardless of treatment, subjects\u27 attitudes toward treatment were overwhelmingly positive

    A novel training and collaboration integrated framework for human-agent teleoperation.

    Get PDF
    Human operators have the trend of increasing physical and mental workloads when performing teleoperation tasks in uncertain and dynamic environments. In addition, their performances are influenced by subjective factors, potentially leading to operational errors or task failure. Although agent-based methods offer a promising solution to the above problems, the human experience and intelligence are necessary for teleoperation scenarios. In this paper, a truncated quantile critics reinforcement learning-based integrated framework is proposed for human-agent teleoperation that encompasses training, assessment and agent-based arbitration. The proposed framework allows for an expert training agent, a bilateral training and cooperation process to realize the co-optimization of agent and human. It can provide efficient and quantifiable training feedback. Experiments have been conducted to train subjects with the developed algorithm. The performances of human-human and human-agent cooperation modes are also compared. The results have shown that subjects can complete the tasks of reaching and picking and placing with the assistance of an agent in a shorter operational time, with a higher success rate and less workload than human-human cooperation

    Metareasoning about propagators for constraint satisfaction

    Get PDF
    Given the breadth of constraint satisfaction problems (CSPs) and the wide variety of CSP solvers, it is often very difficult to determine a priori which solving method is best suited to a problem. This work explores the use of machine learning to predict which solving method will be most effective for a given problem. We use four different problem sets to determine the CSP attributes that can be used to determine which solving method should be applied. After choosing an appropriate set of attributes, we determine how well j48 decision trees can predict which solving method to apply. Furthermore, we take a cost sensitive approach such that problem instances where there is a great difference in runtime between algorithms are emphasized. We also attempt to use information gained on one class of problems to inform decisions about a second class of problems. Finally, we show that the additional costs of deciding which method to apply are outweighed by the time savings compared to applying the same solving method to all problem instances
    • …
    corecore