14 research outputs found
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
In recent years, data-driven reinforcement learning (RL), also known as
offline RL, have gained significant attention. However, the role of data
sampling techniques in offline RL has been overlooked despite its potential to
enhance online RL performance. Recent research suggests applying sampling
techniques directly to state-transitions does not consistently improve
performance in offline RL. Therefore, in this study, we propose a memory
technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling
perspective to trajectories for more comprehensive information extraction from
limited data. TR enhances learning efficiency by backward sampling of
trajectories that optimizes the use of subsequent state information. Building
on TR, we build the weighted critic target to avoid sampling unseen actions in
offline training, and Prioritized Trajectory Replay (PTR) that enables more
efficient trajectory sampling, prioritized by various trajectory priority
metrics. We demonstrate the benefits of integrating TR and PTR with existing
offline RL algorithms on D4RL. In summary, our research emphasizes the
significance of trajectory-based data sampling techniques in enhancing the
efficiency and performance of offline RL algorithms
Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective
We investigate the problem of learning with noisy labels in real-world
annotation scenarios, where noise can be categorized into two types: factual
noise and ambiguity noise. To better distinguish these noise types and utilize
their semantics, we propose a novel sample selection-based approach for noisy
label learning, called Proto-semi. Proto-semi initially divides all samples
into the confident and unconfident datasets via warm-up. By leveraging the
confident dataset, prototype vectors are constructed to capture class
characteristics. Subsequently, the distances between the unconfident samples
and the prototype vectors are calculated to facilitate noise classification.
Based on these distances, the labels are either corrected or retained,
resulting in the refinement of the confident and unconfident datasets. Finally,
we introduce a semi-supervised learning method to enhance training. Empirical
evaluations on a real-world annotated dataset substantiate the robustness of
Proto-semi in handling the problem of learning from noisy labels. Meanwhile,
the prototype-based repartitioning strategy is shown to be effective in
mitigating the adverse impact of label noise. Our code and data are available
at https://github.com/fuxiAIlab/ProtoSemi
Examining the Effect of Pre-training on Time Series Classification
Although the pre-training followed by fine-tuning paradigm is used
extensively in many fields, there is still some controversy surrounding the
impact of pre-training on the fine-tuning process. Currently, experimental
findings based on text and image data lack consensus. To delve deeper into the
unsupervised pre-training followed by fine-tuning paradigm, we have extended
previous research to a new modality: time series. In this study, we conducted a
thorough examination of 150 classification datasets derived from the Univariate
Time Series (UTS) and Multivariate Time Series (MTS) benchmarks. Our analysis
reveals several key conclusions. (i) Pre-training can only help improve the
optimization process for models that fit the data poorly, rather than those
that fit the data well. (ii) Pre-training does not exhibit the effect of
regularization when given sufficient training time. (iii) Pre-training can only
speed up convergence if the model has sufficient ability to fit the data. (iv)
Adding more pre-training data does not improve generalization, but it can
strengthen the advantage of pre-training on the original data volume, such as
faster convergence. (v) While both the pre-training task and the model
structure determine the effectiveness of the paradigm on a given dataset, the
model structure plays a more significant role
Reinforcement Learning Experience Reuse with Policy Residual Representation
Experience reuse is key to sample-efficient reinforcement learning. One of
the critical issues is how the experience is represented and stored.
Previously, the experience can be stored in the forms of features, individual
models, and the average model, each lying at a different granularity. However,
new tasks may require experience across multiple granularities. In this paper,
we propose the policy residual representation (PRR) network, which can extract
and store multiple levels of experience. PRR network is trained on a set of
tasks with a multi-level architecture, where a module in each level corresponds
to a subset of the tasks. Therefore, the PRR network represents the experience
in a spectrum-like way. When training on a new task, PRR can provide different
levels of experience for accelerating the learning. We experiment with the PRR
network on a set of grid world navigation tasks, locomotion tasks, and fighting
tasks in a video game. The results show that the PRR network leads to better
reuse of experience and thus outperforms some state-of-the-art approaches.Comment: Conference version appears in IJCAI 201
Towards Long-term Annotators: A Supervised Label Aggregation Baseline
Relying on crowdsourced workers, data crowdsourcing platforms are able to
efficiently provide vast amounts of labeled data. Due to the variability in the
annotation quality of crowd workers, modern techniques resort to redundant
annotations and subsequent label aggregation to infer true labels. However,
these methods require model updating during the inference, posing challenges in
real-world implementation. Meanwhile, in recent years, many data labeling tasks
have begun to require skilled and experienced annotators, leading to an
increasing demand for long-term annotators. These annotators could leave
substantial historical annotation records on the crowdsourcing platforms, which
can benefit label aggregation, but are ignored by previous works. Hereby, in
this paper, we propose a novel label aggregation technique, which does not need
any model updating during inference and can extensively explore the historical
annotation records. We call it SuperLA, a Supervised Label Aggregation method.
Inside this model, we design three types of input features and a
straightforward neural network structure to merge all the information together
and subsequently produce aggregated labels. Based on comparison experiments
conducted on 22 public datasets and 11 baseline methods, we find that SuperLA
not only outperforms all those baselines in inference performance but also
offers significant advantages in terms of efficiency
Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video
For few-shot learning, it is still a critical challenge to realize photo-realistic face visually dubbing on high-resolution videos. Previous works fail to generate high-fidelity dubbing results. To address the above problem, this paper proposes a Deformation Inpainting Network (DINet) for high-resolution face visually dubbing. Different from previous works relying on multiple up-sample layers to directly generate pixels from latent embeddings, DINet performs spatial deformation on feature maps of reference images to better preserve high-frequency textural details. Specifically, DINet consists of one deformation part and one inpainting part. In the first part, five reference facial images adaptively perform spatial deformation to create deformed feature maps encoding mouth shapes at each frame, in order to align with input driving audio and also the head poses of input source images. In the second part, to produce face visually dubbing, a feature decoder is responsible for adaptively incorporating mouth movements from the deformed feature maps and other attributes (i.e., head pose and upper facial expression) from the source feature maps together. Finally, DINet achieves face visually dubbing with rich textural details. We conduct qualitative and quantitative comparisons to validate our DINet on high-resolution videos. The experimental results show that our method outperforms state-of-the-art works