32 research outputs found

    Towards Interpretable Explanations for Transfer Learning in Sequential Tasks

    Get PDF
    People increasingly rely on machine learning (ML) to make intelligent decisions. However, the ML results are often difficult to interpret and the algorithms do not support interaction to solicit clarification or explanation. In this paper, we highlight an emerging research area of interpretable explanations for transfer learning in sequential tasks, in which an agent must explain how it learns a new task given prior, common knowledge. The goal is to enhance a user’s ability to trust and use the system output and to enable iterative feedback for improving the system. We review prior work in probabilistic systems, sequential decision-making, interpretable explanations, transfer learning, and interactive machine learning, and identify an intersection that deserves further research focus. We believe that developing adaptive, transparent learning models will build the foundation for better human-machine systems in applications for elder care, education, and health care

    Discovering Blind Spots in Reinforcement Learning

    Full text link
    Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult to discover because the agent cannot predict them a priori. We propose using oracle feedback to learn a predictive model of these blind spots to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: The agent does not have the appropriate features to represent the true state of the world and thus cannot distinguish among numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. We learn models to predict blind spots in unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. The models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach on two domains and show that it achieves higher predictive performance than baseline methods, and that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how they influence the discovery of blind spots.Comment: To appear at AAMAS 201

    Sirtuins in Adipose Tissue Metabolism

    Get PDF
    Obesity, a complex metabolic disorder linked to the development of several diseases, is characterized by both hypertrophy and hyperplasia of adipocytes. While white adipose tissue (WAT) is an energy storage site, brown adipose tissue (BAT) activation generates heat from nutrients by non-shivering thermogenesis. The human orthologue of silencing information regulator 2 (Sir2) which was recognized as a regulator of life span in S. cerevisiae, includes seven sirtuins which are NAD+-dependent protein deacetylases distributed in different subcellular compartments. Sirtuins, particularly Sirt1, have emerged as important nutrient sensors and regulators of metabolism. Sirt1 has been shown to play a role in retarding the expansion of WAT while stimulating both differentiation and activation of brown adipose tissue as well as browning of WAT. This chapter focuses on the role of sirtuins in adipose tissue biology, their implications in obesity and potential as therapeutic targets

    Workflow-Guided Response Generation for Task-Oriented Dialogue

    Full text link
    Task-oriented dialogue (TOD) systems aim to achieve specific goals through interactive dialogue. Such tasks usually involve following specific workflows, i.e. executing a sequence of actions in a particular order. While prior work has focused on supervised learning methods to condition on past actions, they do not explicitly optimize for compliance to a desired workflow. In this paper, we propose a novel framework based on reinforcement learning (RL) to generate dialogue responses that are aligned with a given workflow. Our framework consists of ComplianceScorer, a metric designed to evaluate how well a generated response executes the specified action, combined with an RL opimization process that utilizes an interactive sampling technique. We evaluate our approach on two TOD datasets, Action-Based Conversations Dataset (ABCD) (Chen et al., 2021a) and MultiWOZ 2.2 (Zang et al., 2020) on a range of automated and human evaluation metrics. Our findings indicate that our RL-based framework outperforms baselines and is effective at enerating responses that both comply with the intended workflows while being expressed in a natural and fluent manner

    Robot affordance learning with human interaction

    Get PDF
    Many of the current tasks robots perform are trivial and already programmed in their systems. Modifying a robot’s ability to dynamically adapt learning through both human interaction and independent exploration can be helpful to various groups of people, such as elders needing assistance, students getting an education, and doctors needing medical assistants. In my research, the goal was to compare how two different ways of learning, independent and human-guided, compared when a robot learned object affordances. This research can give insight into how robots can take advantage of human help to improve their learning. My main goal was to have Simon independently learn affordances of different objects in the environment and see how the learning benefited when the robot had human guidance. I first had Simon perform several actions on objects, and I recorded the state of the object prior to the action, the actual action Simon performed, and the state of the object afterwards. The robot then used this data to predict the affordances of future objects. Once the robot was able to learn about its environment independently, I included human guidance in the second condition by presenting the objects in a way that humans would naturally, based on previous work. The benefit of having this human guidance was that the examples were much more balanced in terms of positive and negative examples, leading to a more effective classifier. To test this, an experiment was conducted in which Simon performed slide and grasp actions on 5 different objects. There were two conditions for each action, systematic, in which Simon tried all possible configurations, and human-guided, in which the examples were more balanced. Results showed that the human-guided condition resulted in slightly more accurate predictions than the independent, systematic condition.Thomaz, Andrea - Faculty Mentor; Isbell, Charles - Committee Member/Second Reade

    Perturbation training for human-robot teams

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 63-67).Today, robots are often deployed to work separately from people. Combining the strengths of humans and robots, however, can potentially lead to a stronger joint team. To have fluid human-robot collaboration, these teams must train to achieve high team performance and flexibility on new tasks. This requires a computational model that supports the human in learning and adapting to new situations. In this work, we design and evaluate a computational learning model that enables a human-robot team to co-develop joint strategies for performing novel tasks requiring coordination. The joint strategies are learned through "perturbation training," a human team-training strategy that requires practicing variations of a given task to help the team generalize to new variants of that task. Our Adaptive Perturbation Training (AdaPT) algorithm is a hybrid of transfer learning and reinforcement learning techniques and extends the Policy Reuse in Q-Learning (PRQL) algorithm to learn more quickly in new task variants. We empirically validate this advantage of AdaPT over PRQL through computational simulations. We then augment our algorithm AdaPT with a co-learning framework and a computational bi-directional communication protocol so that the robot can work with a person in live interactions. These three features constitute our human-robot perturbation training model. We conducted human subject experiments to show proof-of-concept that our model enables a robot to draw from its library of prior experiences in a way that leads to high team performance. We compare our algorithm with a standard reinforcement learning algorithm Q-learning and find that AdaPT-trained teams achieved significantly higher reward on novel test tasks than Q-learning teams. This indicates that the robot's algorithm, rather than just the human's experience of perturbations, is key to achieving high team performance. We also show that our algorithm does not sacrifice performance on the base task after training on perturbations. Finally, we demonstrate that human-robot training in a simulation environment using AdaPT produced effective team performance with an embodied robot partner.by Ramya Ramakrishnan.S.M

    Error discovery through human-artificial intelligence collaboration

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from PDF version of thesis.Includes bibliographical references (pages 181-202).While there has been a recent rise in increasingly effective human-Al teams in areas such as autonomous driving, manufacturing, and robotics, many catastrophic failures still occur. Understanding the cause(s) of these errors is crucial for reducing and fixing them. One source of error is due to an agent's or human's limited view of the world, which means their representations are insufficient for acting safely. For example, self-driving cars may have limited sensing that causes them to not recognize rare vehicle types, like emergency vehicles. This thesis focuses on identifying errors that occur due to deficiencies in agent and human representations. In the first part, we develop an approach that uses human feedback to identify agent errors that occur due to an agent's limited state representation, meaning that the agent cannot observe all features of the world. Experiments show that using our model, an agent discovers error regions and is able to query for human help intelligently to safely act in the real world. In the second part, we focus on determining the cause of human errors as either occurring due to the human's flawed observation of the world or due to other factors, such as noise or insufficient training. We present a generative model that approximates the human's decision-making process and show that we can infer the latent error sources with a limited amount of human demonstration data. In the final thesis component, we tackle the setting where both an agent and a human have rich perception, but due to selective attention, they each only focus on a subset of features. When deploying these learned policies, important features in the real world may be ignored because the simulator did not accurately model all regions of the real world. Our approach is able to identify scenarios in which an agent should transfer control to a human who may be better suited to act, leading to safe joint execution in the world.by Ramya Ramakrishnan.Ph. D.Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc

    Depression, anxiety, and bodily pain independently predict poor sleep quality among adult women attending a primary health center of Puducherry, India

    No full text
    Background: Sleep disorders and mental health problems are common diagnoses in primary care settings. The objective of this study was to estimate the magnitude of poor sleep, depression, and anxiety through opportunistic screening and to find out the independent predictors of poor sleep quality among female participants. Materials and Methods: A hospital-based study was conducted in the outpatient department (OPD) of an urban primary health center of Puducherry. Patients and accompanying healthy attendants ≥ 18 years of age who visited the OPD for any reason were included. Those with serious acute illness, previously diagnosed mental illness, pregnant women, and women in postpartum period (upto 6 weeks) were excluded. Systematic random sampling was used to select the participants. A semi-structured questionnaire was used to collect sociodemographic and clinical details along with the Pittsburgh Sleep Quality Index (PSQI) and the Hospital Anxiety and Depression Scale. Height and weight were also measured. Results: A total of 301 participants were recruited. Mean age of the participants was 49.4 (standard deviation 15.2) years. Magnitude of poor sleep (PSQI score > 5), abnormal anxiety, and abnormal depression were 118 (39.2%), 60 (19.9%), and 28 (9.3%) respectively. Multivariate logistic regression analysis showed that history of pain [odds ratio (OR) 3.2 (1.6–6.5), P = 0.001], abnormal anxiety [OR 2.5 (1.2–5.6), P = 0.021], and abnormal depression [OR 4.3 (1.4–13.2), P = 0.01] independently predicted poor sleep quality among females. Conclusion: OPD-based opportunistic screening for sleep and mental health problems should be routinely conducted by primary care and family physicians

    An Expertise Recommender using Web Mining

    No full text
    This report explored techniques to mine web pages of scientists to extract information regarding their expertise, build expertise chains and referral webs, and semi automatically combine this information with directory information services to create a recommender system that permits query by expertise. The approach included experimenting with existing techniques that have been reported in research literature in recent past , and adapted them as needed. In addition, software tools were developed to capture and use this information
    corecore