206,915 research outputs found

    Sequential learning without feedback

    Full text link
    In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn} strategies for selecting tests to optimize accuracy \& costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an {\it unusual} reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well. We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret

    Sequential learning without feedback

    Full text link
    In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn} strategies for selecting tests to optimize accuracy \& costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an {\it unusual} reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well. We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret

    Methodological issues in using sequential representations in the teaching of writing

    Get PDF
    This study looks at a specific application of Ainsworth’s conceptual framework for learning with multiple representations in the context of using multiple sequential graphic organizers that are student‐generated for a process‐writing task. Process writing refers to writing that consists of multiple drafts. It may be a process of re‐writing without feedback or re‐writing based on feedback where the teacher or peers will provide feedback on the original draft and then the students will revise their writing based on the feedback given. The objective was to explore how knowledge of students’ cognitive processes when using multiple organizers can inform the teaching of writing. The literature review analyzes the interaction of the design, function and task components of the framework; culminating in instructional approaches for using multiple organizers for classes with students of different writing abilities. Extended implications for designers of concept mapping tools based on these approaches are provided

    Distinguishing Emergent and Sequential Processes by Learning Emergent Second-Order Features

    Get PDF
    abstract: Emergent processes can roughly be defined as processes that self-arise from interactions without a centralized control. People have many robust misconceptions in explaining emergent process concepts such as natural selection and diffusion. This is because they lack a proper categorical representation of emergent processes and often misclassify these processes into the sequential processes category that they are more familiar with. The two kinds of processes can be distinguished by their second-order features that describe how one interaction relates to another interaction. This study investigated if teaching emergent second-order features can help people more correctly categorize new processes, it also compared different instructional methods in teaching emergent second-order features. The prediction was that learning emergent features should help more than learning sequential features because what most people lack is the representation of emergent processes. Results confirmed this by showing participants who generated emergent features and got correct features as feedback were better at distinguishing two kinds of processes compared to participants who rewrote second-order sequential features. Another finding was that participants who generated emergent features followed by reading correct features as feedback did better in distinguishing the processes than participants who only attempted to generate the emergent features without feedback. Finally, switching the order of instruction by teaching emergent features and then asking participants to explain the difference between emergent and sequential features resulted in equivalent learning gain as the experimental group that received feedback. These results proved teaching emergent second-order features helps people categorize processes and demonstrated the most efficient way to teach them.Dissertation/ThesisMasters Thesis Psychology 201

    The dissociable effects of reward on sequential motor behavior

    Get PDF
    Reward has consistently been shown to enhance motor behavior; however, its beneficial effects appear to be largely unspecific. For example, reward is associated with both rapid and training-dependent improvements in performance, with a mechanistic account of these effects currently lacking. Here we tested the hypothesis that these distinct reward-based improvements are driven by dissociable reward types: monetary incentive and performance feedback. Whereas performance feedback provides information on how well a motor task has been completed (knowledge of performance), monetary incentive increases the motivation to perform optimally without providing a performance-based learning signal. Experiment 1 showed that groups who received monetary incentive rapidly improved movement times (MTs), using a novel sequential reaching task. In contrast, only groups with correct performance-based feedback showed learning-related improvements. Importantly, pairing both maximized MT performance gains and accelerated movement fusion. Fusion describes an optimization process during which neighboring sequential movements blend together to form singular actions. Results from experiment 2 served as a replication and showed that fusion led to enhanced performance speed while also improving movement efficiency through increased smoothness. Finally, experiment 3 showed that these improvements in performance persist for 24 h even without reward availability. This highlights the dissociable impact of monetary incentive and performance feedback, with their combination maximizing performance gains and leading to stable improvements in the speed and efficiency of sequential actions. NEW & NOTEWORTHY Our work provides a mechanistic framework for how reward influences motor behavior. Specifically, we show that rapid improvements in speed and accuracy are driven by reward presented in the form of money, whereas knowledge of performance through performance feedback leads to training-based improvements. Importantly, combining both maximized performance gains and led to improvements in movement quality through fusion, which describes an optimization process during which sequential movements blend into a single action

    Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

    Full text link
    Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors

    Meta-Learning with Adaptive Weighted Loss for Imbalanced Cold-Start Recommendation

    Full text link
    Sequential recommenders have made great strides in capturing a user's preferences. Nevertheless, the cold-start recommendation remains a fundamental challenge as they typically involve limited user-item interactions for personalization. Recently, gradient-based meta-learning approaches have emerged in the sequential recommendation field due to their fast adaptation and easy-to-integrate abilities. The meta-learning algorithms formulate the cold-start recommendation as a few-shot learning problem, where each user is represented as a task to be adapted. While meta-learning algorithms generally assume that task-wise samples are evenly distributed over classes or values, user-item interactions in real-world applications do not conform to such a distribution (e.g., watching favorite videos multiple times, leaving only positive ratings without any negative ones). Consequently, imbalanced user feedback, which accounts for the majority of task training data, may dominate the user adaptation process and prevent meta-learning algorithms from learning meaningful meta-knowledge for personalized recommendations. To alleviate this limitation, we propose a novel sequential recommendation framework based on gradient-based meta-learning that captures the imbalanced rating distribution of each user and computes adaptive loss for user-specific learning. Our work is the first to tackle the impact of imbalanced ratings in cold-start sequential recommendation scenarios. Through extensive experiments conducted on real-world datasets, we demonstrate the effectiveness of our framework.Comment: Accepted by CIKM 202
    • 

    corecore