256 research outputs found
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
Unsupervised pre-training methods utilizing large and diverse datasets have
achieved tremendous success across a range of domains. Recent work has
investigated such unsupervised pre-training methods for model-based
reinforcement learning (MBRL) but is limited to domain-specific or simulated
data. In this paper, we study the problem of pre-training world models with
abundant in-the-wild videos for efficient learning of downstream visual control
tasks. However, in-the-wild videos are complicated with various contextual
factors, such as intricate backgrounds and textured appearance, which precludes
a world model from extracting shared world knowledge to generalize better. To
tackle this issue, we introduce Contextualized World Models (ContextWM) that
explicitly model both the context and dynamics to overcome the complexity and
diversity of in-the-wild videos and facilitate knowledge transfer between
distinct scenes. Specifically, a contextualized extension of the latent
dynamics model is elaborately realized by incorporating a context encoder to
retain contextual information and empower the image decoder, which allows the
latent dynamics model to concentrate on essential temporal variations. Our
experiments show that in-the-wild video pre-training equipped with ContextWM
can significantly improve the sample-efficiency of MBRL in various domains,
including robotic manipulation, locomotion, and autonomous driving
Research progress on the development of pennycress (Thlaspi arvense L.) as a new seed oil crop: a review
Compared with other crops, pennycress (Thlaspi arvense L.) is a niche emerging oil crop. In recent years, research on pennycress has been increasingly reflected in various directions. Pennycress belongs to the Brassicaceae family and was introduced from Eurasia to North America. It has been found worldwide as a cultivated plant and weed. In this paper, we review the advantages of pennycress as a supplementary model plant of Arabidopsis thaliana, oil and protein extraction technology, seed composition analysis based on metabolomics, germplasm resource development, growth, and ecological impact research, abiotic stress, fatty acid extraction optimization strategy, and other aspects of studies over recent years. The main research directions proposed for the future are as follows: (1) assemble the genome of pennycress to complete its entire genome data, (2) optimize the extraction process of pennycress as biodiesel, (3) analyze the molecular mechanism of the fatty acid synthesis pathway in pennycress, and (4) the functions of key genes corresponding to various adversity conditions of pennycress
Adaptive Graphical Model Network for 2D Handpose Estimation
In this paper, we propose a new architecture called Adaptive Graphical Model
Network (AGMN) to tackle the task of 2D hand pose estimation from a monocular
RGB image. The AGMN consists of two branches of deep convolutional neural
networks for calculating unary and pairwise potential functions, followed by a
graphical model inference module for integrating unary and pairwise potentials.
Unlike existing architectures proposed to combine DCNNs with graphical models,
our AGMN is novel in that the parameters of its graphical model are conditioned
on and fully adaptive to individual input images. Experiments show that our
approach outperforms the state-of-the-art method used in 2D hand keypoints
estimation by a notable margin on two public datasets.Comment: 30th British Machine Vision Conference (BMVC
Tensor-based Intrinsic Subspace Representation Learning for Multi-view Clustering
As a hot research topic, many multi-view clustering approaches are proposed
over the past few years. Nevertheless, most existing algorithms merely take the
consensus information among different views into consideration for clustering.
Actually, it may hinder the multi-view clustering performance in real-life
applications, since different views usually contain diverse statistic
properties. To address this problem, we propose a novel Tensor-based Intrinsic
Subspace Representation Learning (TISRL) for multi-view clustering in this
paper. Concretely, the rank preserving decomposition is proposed firstly to
effectively deal with the diverse statistic information contained in different
views. Then, to achieve the intrinsic subspace representation, the
tensor-singular value decomposition based low-rank tensor constraint is also
utilized in our method. It can be seen that specific information contained in
different views is fully investigated by the rank preserving decomposition, and
the high-order correlations of multi-view data are also mined by the low-rank
tensor constraint. The objective function can be optimized by an augmented
Lagrangian multiplier based alternating direction minimization algorithm.
Experimental results on nine common used real-world multi-view datasets
illustrate the superiority of TISRL
Learning a Deep Color Difference Metric for Photographic Images
Most well-established and widely used color difference (CD) metrics are
handcrafted and subject-calibrated against uniformly colored patches, which do
not generalize well to photographic images characterized by natural scene
complexities. Constructing CD formulae for photographic images is still an
active research topic in imaging/illumination, vision science, and color
science communities. In this paper, we aim to learn a deep CD metric for
photographic images with four desirable properties. First, it well aligns with
the observations in vision science that color and form are linked inextricably
in visual cortical processing. Second, it is a proper metric in the
mathematical sense. Third, it computes accurate CDs between photographic
images, differing mainly in color appearances. Fourth, it is robust to mild
geometric distortions (e.g., translation or due to parallax), which are often
present in photographic images of the same scene captured by different digital
cameras. We show that all these properties can be satisfied at once by learning
a multi-scale autoregressive normalizing flow for feature transform, followed
by the Euclidean distance which is linearly proportional to the human
perceptual CD. Quantitative and qualitative experiments on the large-scale SPCD
dataset demonstrate the promise of the learned CD metric
Light Field Diffusion for Single-View Novel View Synthesis
Single-view novel view synthesis, the task of generating images from new
viewpoints based on a single reference image, is an important but challenging
task in computer vision. Recently, Denoising Diffusion Probabilistic Model
(DDPM) has become popular in this area due to its strong ability to generate
high-fidelity images. However, current diffusion-based methods directly rely on
camera pose matrices as viewing conditions, globally and implicitly introducing
3D constraints. These methods may suffer from inconsistency among generated
images from different perspectives, especially in regions with intricate
textures and structures. In this work, we present Light Field Diffusion (LFD),
a conditional diffusion-based model for single-view novel view synthesis.
Unlike previous methods that employ camera pose matrices, LFD transforms the
camera view information into light field encoding and combines it with the
reference image. This design introduces local pixel-wise constraints within the
diffusion models, thereby encouraging better multi-view consistency.
Experiments on several datasets show that our LFD can efficiently generate
high-fidelity images and maintain better 3D consistency even in intricate
regions. Our method can generate images with higher quality than NeRF-based
models, and we obtain sample quality similar to other diffusion-based models
but with only one-third of the model size
- …