176,129 research outputs found
Deep Task-specific Bottom Representation Network for Multi-Task Recommendation
Neural-based multi-task learning (MTL) has gained significant improvement,
and it has been successfully applied to recommendation system (RS). Recent deep
MTL methods for RS (e.g. MMoE, PLE) focus on designing soft gating-based
parameter-sharing networks that implicitly learn a generalized representation
for each task. However, MTL methods may suffer from performance degeneration
when dealing with conflicting tasks, as negative transfer effects can occur on
the task-shared bottom representation. This can result in a reduced capacity
for MTL methods to capture task-specific characteristics, ultimately impeding
their effectiveness and hindering the ability to generalize well on all tasks.
In this paper, we focus on the bottom representation learning of MTL in RS and
propose the Deep Task-specific Bottom Representation Network (DTRN) to
alleviate the negative transfer problem. DTRN obtains task-specific bottom
representation explicitly by making each task have its own representation
learning network in the bottom representation modeling stage. Specifically, it
extracts the user's interests from multiple types of behavior sequences for
each task through the parameter-efficient hypernetwork. To further obtain the
dedicated representation for each task, DTRN refines the representation of each
feature by employing a SENet-like network for each task. The two proposed
modules can achieve the purpose of getting task-specific bottom representation
to relieve tasks' mutual interference. Moreover, the proposed DTRN is flexible
to combine with existing MTL methods. Experiments on one public dataset and one
industrial dataset demonstrate the effectiveness of the proposed DTRN.Comment: CIKM'2
Learning Stable Task Sequences from Demonstration with Linear Parameter Varying Systems and Hidden Markov Models
The problem of acquiring multiple tasks from demonstration is typi- cally divided in two sequential processes: (1) the segmentation or identification of different subgoals/subtasks and (2) a separate learning process that parameterizes a control policy for each subtask. As a result, segmentation criteria typically neglect the characteristics of control policies and rely instead on simplified models. This paper aims for a single model capable of learning sequences of complex time-independent control policies that provide robust and stable behavior. To this end, we first present a novel and efficient approach to learn goal-oriented time independent motion models by estimating both attractor and dynamic behavior from data guaranteeing stability using linear parameter varying (LPV) systems. This method enables learning complex task sequences with hidden Markov models (HMMs), where each state/subtask is given by a stable LPV system and where transitions are most likely around the corresponding attractor. We study the dynamics of the HMM-LPV model and propose a motion generation method that guarantees the stability of task sequences. We validate our approach in two sets of demonstrated human motion
Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays
During sleep and awake rest, the hippocampus replays sequences of place cells
that have been activated during prior experiences. These have been interpreted
as a memory consolidation process, but recent results suggest a possible
interpretation in terms of reinforcement learning. The Dyna reinforcement
learning algorithms use off-line replays to improve learning. Under limited
replay budget, a prioritized sweeping approach, which requires a model of the
transitions to the predecessors, can be used to improve performance. We
investigate whether such algorithms can explain the experimentally observed
replays. We propose a neural network version of prioritized sweeping
Q-learning, for which we developed a growing multiple expert algorithm, able to
cope with multiple predecessors. The resulting architecture is able to improve
the learning of simulated agents confronted to a navigation task. We predict
that, in animals, learning the world model should occur during rest periods,
and that the corresponding replays should be shuffled.Comment: Living Machines 2018 (Paris, France
- …