206,915 research outputs found
Sequential learning without feedback
In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn} strategies for selecting tests to optimize accuracy \& costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an {\it unusual} reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well. We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret
Sequential learning without feedback
In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn} strategies for selecting tests to optimize accuracy \& costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an {\it unusual} reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well. We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret
Methodological issues in using sequential representations in the teaching of writing
This study looks at a specific application of Ainsworthâs conceptual framework for learning with multiple representations in the context of using multiple sequential graphic organizers that are studentâgenerated for a processâwriting task. Process writing refers to writing that consists of multiple drafts. It may be a process of reâwriting without feedback or reâwriting based on feedback where the teacher or peers will provide feedback on the original draft and then the students will revise their writing based on the feedback given. The objective was to explore how knowledge of studentsâ cognitive processes when using multiple organizers can inform the teaching of writing. The literature review analyzes the interaction of the design, function and task components of the framework; culminating in instructional approaches for using multiple organizers for classes with students of different writing abilities. Extended implications for designers of concept mapping tools based on these approaches are provided
Distinguishing Emergent and Sequential Processes by Learning Emergent Second-Order Features
abstract: Emergent processes can roughly be defined as processes that self-arise from interactions without a centralized control. People have many robust misconceptions in explaining emergent process concepts such as natural selection and diffusion. This is because they lack a proper categorical representation of emergent processes and often misclassify these processes into the sequential processes category that they are more familiar with. The two kinds of processes can be distinguished by their second-order features that describe how one interaction relates to another interaction. This study investigated if teaching emergent second-order features can help people more correctly categorize new processes, it also compared different instructional methods in teaching emergent second-order features. The prediction was that learning emergent features should help more than learning sequential features because what most people lack is the representation of emergent processes. Results confirmed this by showing participants who generated emergent features and got correct features as feedback were better at distinguishing two kinds of processes compared to participants who rewrote second-order sequential features. Another finding was that participants who generated emergent features followed by reading correct features as feedback did better in distinguishing the processes than participants who only attempted to generate the emergent features without feedback. Finally, switching the order of instruction by teaching emergent features and then asking participants to explain the difference between emergent and sequential features resulted in equivalent learning gain as the experimental group that received feedback. These results proved teaching emergent second-order features helps people categorize processes and demonstrated the most efficient way to teach them.Dissertation/ThesisMasters Thesis Psychology 201
The dissociable effects of reward on sequential motor behavior
Reward has consistently been shown to enhance motor behavior; however, its beneficial effects appear to be largely unspecific. For example, reward is associated with both rapid and training-dependent improvements in performance, with a mechanistic account of these effects currently lacking. Here we tested the hypothesis that these distinct reward-based improvements are driven by dissociable reward types: monetary incentive and performance feedback. Whereas performance feedback provides information on how well a motor task has been completed (knowledge of performance), monetary incentive increases the motivation to perform optimally without providing a performance-based learning signal. Experiment 1 showed that groups who received monetary incentive rapidly improved movement times (MTs), using a novel sequential reaching task. In contrast, only groups with correct performance-based feedback showed learning-related improvements. Importantly, pairing both maximized MT performance gains and accelerated movement fusion. Fusion describes an optimization process during which neighboring sequential movements blend together to form singular actions. Results from experiment 2 served as a replication and showed that fusion led to enhanced performance speed while also improving movement efficiency through increased smoothness. Finally, experiment 3 showed that these improvements in performance persist for 24 h even without reward availability. This highlights the dissociable impact of monetary incentive and performance feedback, with their combination maximizing performance gains and leading to stable improvements in the speed and efficiency of sequential actions. NEW & NOTEWORTHY Our work provides a mechanistic framework for how reward influences motor behavior. Specifically, we show that rapid improvements in speed and accuracy are driven by reward presented in the form of money, whereas knowledge of performance through performance feedback leads to training-based improvements. Importantly, combining both maximized performance gains and led to improvements in movement quality through fusion, which describes an optimization process during which sequential movements blend into a single action
Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback
Exploration and reward specification are fundamental and intertwined
challenges for reinforcement learning. Solving sequential decision-making tasks
requiring expansive exploration requires either careful design of reward
functions or the use of novelty-seeking exploration bonuses. Human supervisors
can provide effective guidance in the loop to direct the exploration process,
but prior methods to leverage this guidance require constant synchronous
high-quality human feedback, which is expensive and impractical to obtain. In
this work, we present a technique called Human Guided Exploration (HuGE), which
uses low-quality feedback from non-expert users that may be sporadic,
asynchronous, and noisy. HuGE guides exploration for reinforcement learning not
only in simulation but also in the real world, all without meticulous reward
specification. The key concept involves bifurcating human feedback and policy
learning: human feedback steers exploration, while self-supervised learning
from the exploration data yields unbiased policies. This procedure can leverage
noisy, asynchronous human feedback to learn policies with no hand-crafted
reward design or exploration bonuses. HuGE is able to learn a variety of
challenging multi-stage robotic navigation and manipulation tasks in simulation
using crowdsourced feedback from non-expert users. Moreover, this paradigm can
be scaled to learning directly on real-world robots, using occasional,
asynchronous feedback from human supervisors
Meta-Learning with Adaptive Weighted Loss for Imbalanced Cold-Start Recommendation
Sequential recommenders have made great strides in capturing a user's
preferences. Nevertheless, the cold-start recommendation remains a fundamental
challenge as they typically involve limited user-item interactions for
personalization. Recently, gradient-based meta-learning approaches have emerged
in the sequential recommendation field due to their fast adaptation and
easy-to-integrate abilities. The meta-learning algorithms formulate the
cold-start recommendation as a few-shot learning problem, where each user is
represented as a task to be adapted. While meta-learning algorithms generally
assume that task-wise samples are evenly distributed over classes or values,
user-item interactions in real-world applications do not conform to such a
distribution (e.g., watching favorite videos multiple times, leaving only
positive ratings without any negative ones). Consequently, imbalanced user
feedback, which accounts for the majority of task training data, may dominate
the user adaptation process and prevent meta-learning algorithms from learning
meaningful meta-knowledge for personalized recommendations. To alleviate this
limitation, we propose a novel sequential recommendation framework based on
gradient-based meta-learning that captures the imbalanced rating distribution
of each user and computes adaptive loss for user-specific learning. Our work is
the first to tackle the impact of imbalanced ratings in cold-start sequential
recommendation scenarios. Through extensive experiments conducted on real-world
datasets, we demonstrate the effectiveness of our framework.Comment: Accepted by CIKM 202
- âŠ