4,286 research outputs found
Discovering Blind Spots in Reinforcement Learning
Agents trained in simulation may make errors in the real world due to
mismatches between training and execution environments. These mistakes can be
dangerous and difficult to discover because the agent cannot predict them a
priori. We propose using oracle feedback to learn a predictive model of these
blind spots to reduce costly errors in real-world applications. We focus on
blind spots in reinforcement learning (RL) that occur due to incomplete state
representation: The agent does not have the appropriate features to represent
the true state of the world and thus cannot distinguish among numerous states.
We formalize the problem of discovering blind spots in RL as a noisy supervised
learning problem with class imbalance. We learn models to predict blind spots
in unseen regions of the state space by combining techniques for label
aggregation, calibration, and supervised learning. The models take into
consideration noise emerging from different forms of oracle feedback, including
demonstrations and corrections. We evaluate our approach on two domains and
show that it achieves higher predictive performance than baseline methods, and
that the learned model can be used to selectively query an oracle at execution
time to prevent errors. We also empirically analyze the biases of various
feedback types and how they influence the discovery of blind spots.Comment: To appear at AAMAS 201
Shifting the ways prospective teachers frame and notice student mathematical thinking: from deficits to strengths
Noticing the strengths in studentsâ mathematical thinking is a critical skill that teachers need to develop, but it can be challenging due to the prevalence of deficit-based thinking in mathematics education. To address this challenge, a teacher education course was designed to encourage prospective teachers to engage in critical reflection on their own and othersâ framings of studentsâ thinking and shift their focus towards noticing studentsâ strengths. The study analyzed written responses from the prospective teachers, collected at the beginning and end of the course, to investigate their framing and noticing of studentsâ mathematical thinking. The analysis focused on the aspects of studentsâ thinking that the prospective teachers paid attention to, the stances they took when interpreting studentsâ thinking, and the instructional moves they proposed in response to their thinking. Furthermore, the study established a spectrum of deficit-based and strength-based framings on studentsâ mathematical thinking. This spectrum allowed for the identification of each participantâs written noticing responses within a range of possibilities, contributing to a more nuanced understanding of the changes in teachersâ framing and noticing of studentsâ thinking over time
Shifting the ways prospective teachers frame and notice student mathematical thinking : From deficits to strengths
Noticing the strengths in studentsâ mathematical thinking is a critical skill that teachers need to develop, but it can be challenging due to the prevalence of deficit-based thinking in mathematics education. To address this challenge, a teacher education course was designed to encourage prospective teachers to engage in critical reflection on their own and othersâ framings of studentsâ thinking and shift their focus towards noticing studentsâ strengths. The study analyzed written responses from the prospective teachers, collected at the beginning and end of the course, to investigate their framing and noticing of studentsâ mathematical thinking. The analysis focused on the aspects of studentsâ thinking that the prospective teachers paid attention to, the stances they took when interpreting studentsâ thinking, and the instructional moves they proposed in response to their thinking. Furthermore, the study established a spectrum of deficit-based and strength-based framings on studentsâ mathematical thinking. This spectrum allowed for the identification of each participantâs written noticing responses within a range of possibilities, contributing to a more nuanced understanding of the changes in teachersâ framing and noticing of studentsâ thinking over time
Preparing Undergraduate Music Majors to Teach Beginning Instrumentalists: The Effects of Self-Evaluation, Teacher Observation, and Performance-Oriented Instructional Approaches on Teacher Behaviors and Pupil Responses.
The purpose of this study was to investigate the effects of three approaches to training preservice instrumental music teachers (N = 22) for initial teaching experiences involving beginning instrumentalists ( N = 22). The three approaches---one involving intensive self-evaluation activities, a second focusing on observation of experienced instrumental music teachers, and a third evidencing a performance orientation---were administered as a four-week treatment phase in an undergraduate brass techniques course. Primarily, this study was designed to answer the question: Did instructional approach differentially affect teacher behavior across two private lessons? Teacher (subject) and pupil behaviors were documented and categorized according to various aspects of subject/pupil activity, subject verbalizations, successful/unsuccessful performance trials, and subjects\u27 secondary instrument (trumpet or trombone) performance competency. In addition, subject and pupil post-treatment attitudes were assessed. Following the treatment phase, subjects taught two lessons to beginning band pupils. Forty-four lessons (totaling more than 1,000 minutes and averaging roughly 24 minutes) were videotaped and analyzed. Certain lesson activities were timed using the behavioral observation computer application, SCRIBE. Results indicated that the self-evaluation group engaged their pupils in performance activity 44.76% of the time, which was significantly more than the teacher observation and performance orientation groups. Using verbatim transcripts of lessons, subject verbalizations were labeled as academic information, direction-giving, information-gathering, or off-task remarks. Pupil responses were categorized as successful, unsuccessful, or no response. Overall subjects used academic verbalizations, three times more than they used direction verbalizations. When pupil responses were preceded by subject verbalizations that were, subject matter rich, pupils were more likely to respond successfully than when verbalizations were subject matter neutral, as in direction-giving (p \u3c .0001). There were no treatment group differences with regard to subject verbalization and pupil responses. Subjects\u27 ability to perform on the secondary instruments studied during treatment was determined by three independent judges. Results indicated no significant differences among treatment groups or between major instruments (brass and non-brass). Further, regardless of treatment, subjects\u27 attitudes toward treatment were overwhelmingly positive
A novel training and collaboration integrated framework for human-agent teleoperation.
Human operators have the trend of increasing physical and mental workloads when performing teleoperation tasks in uncertain and dynamic environments. In addition, their performances are influenced by subjective factors, potentially leading to operational errors or task failure. Although agent-based methods offer a promising solution to the above problems, the human experience and intelligence are necessary for teleoperation scenarios. In this paper, a truncated quantile critics reinforcement learning-based integrated framework is proposed for human-agent teleoperation that encompasses training, assessment and agent-based arbitration. The proposed framework allows for an expert training agent, a bilateral training and cooperation process to realize the co-optimization of agent and human. It can provide efficient and quantifiable training feedback. Experiments have been conducted to train subjects with the developed algorithm. The performances of human-human and human-agent cooperation modes are also compared. The results have shown that subjects can complete the tasks of reaching and picking and placing with the assistance of an agent in a shorter operational time, with a higher success rate and less workload than human-human cooperation
Metareasoning about propagators for constraint satisfaction
Given the breadth of constraint satisfaction problems (CSPs) and the wide variety of CSP solvers, it is often very difficult to determine a priori which solving method is best suited to a problem. This work explores the use of machine learning to predict which solving method will be most effective for a given problem. We use four different problem sets to determine the CSP attributes that can be used to determine which solving method should be applied. After choosing an appropriate set of attributes, we determine how well j48 decision trees can predict which solving method to apply. Furthermore, we take a cost sensitive approach such that problem instances where there is a great difference in runtime between algorithms are emphasized. We also attempt to use information gained on one class of problems to inform decisions about a second class of problems. Finally, we show that the additional costs of deciding which method to apply are outweighed by the time savings compared to applying the same solving method to all problem instances
- âŚ