Search CORE

256 research outputs found

Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning

Author: Deng Chaoyi
Long Mingsheng
Ma Haoyu
Wu Jialong
Publication venue
Publication date: 29/05/2023
Field of study

Unsupervised pre-training methods utilizing large and diverse datasets have achieved tremendous success across a range of domains. Recent work has investigated such unsupervised pre-training methods for model-based reinforcement learning (MBRL) but is limited to domain-specific or simulated data. In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of downstream visual control tasks. However, in-the-wild videos are complicated with various contextual factors, such as intricate backgrounds and textured appearance, which precludes a world model from extracting shared world knowledge to generalize better. To tackle this issue, we introduce Contextualized World Models (ContextWM) that explicitly model both the context and dynamics to overcome the complexity and diversity of in-the-wild videos and facilitate knowledge transfer between distinct scenes. Specifically, a contextualized extension of the latent dynamics model is elaborately realized by incorporating a context encoder to retain contextual information and empower the image decoder, which allows the latent dynamics model to concentrate on essential temporal variations. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample-efficiency of MBRL in various domains, including robotic manipulation, locomotion, and autonomous driving

arXiv.org e-Print Archive

Walls have ears: Eavesdropping user behaviors via graphics-interrupt-based side channel

Author: GAO Debin
JIA Chunfu
MA Haoyu
TIAN Jianwen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2020
Field of study

Institutional Knowledge at Singapore Management University

Research progress on the development of pennycress (Thlaspi arvense L.) as a new seed oil crop: a review

Author: Haoyu Wang
Haoyu Wang
Jianyu Ma
Jianyu Ma
Yuhong Zhang
Yuhong Zhang
Publication venue: Frontiers Media S.A.
Publication date: 01/11/2023
Field of study

Compared with other crops, pennycress (Thlaspi arvense L.) is a niche emerging oil crop. In recent years, research on pennycress has been increasingly reflected in various directions. Pennycress belongs to the Brassicaceae family and was introduced from Eurasia to North America. It has been found worldwide as a cultivated plant and weed. In this paper, we review the advantages of pennycress as a supplementary model plant of Arabidopsis thaliana, oil and protein extraction technology, seed composition analysis based on metabolomics, germplasm resource development, growth, and ecological impact research, abiotic stress, fatty acid extraction optimization strategy, and other aspects of studies over recent years. The main research directions proposed for the future are as follows: (1) assemble the genome of pennycress to complete its entire genome data, (2) optimize the extraction process of pennycress as biodiesel, (3) analyze the molecular mechanism of the fatty acid synthesis pathway in pennycress, and (4) the functions of key genes corresponding to various adversity conditions of pennycress

Directory of Open Access Journals

Adaptive Graphical Model Network for 2D Handpose Estimation

Author: Chen Yifei
Kong Deying
Ma Haoyu
Xie Xiaohui
Yan Xiangyi
Publication venue
Publication date: 18/09/2019
Field of study

In this paper, we propose a new architecture called Adaptive Graphical Model Network (AGMN) to tackle the task of 2D hand pose estimation from a monocular RGB image. The AGMN consists of two branches of deep convolutional neural networks for calculating unary and pairwise potential functions, followed by a graphical model inference module for integrating unary and pairwise potentials. Unlike existing architectures proposed to combine DCNNs with graphical models, our AGMN is novel in that the parameters of its graphical model are conditioned on and fully adaptive to individual input images. Experiments show that our approach outperforms the state-of-the-art method used in 2D hand keypoints estimation by a notable margin on two public datasets.Comment: 30th British Machine Vision Conference (BMVC

arXiv.org e-Print Archive

Tensor-based Intrinsic Subspace Representation Learning for Multi-view Clustering

Author: Li Zhongyu
Ma Shuangxun
Tang Haoyu
Zheng Qinghai
Zhu Jihua
Publication venue
Publication date: 12/11/2020
Field of study

As a hot research topic, many multi-view clustering approaches are proposed over the past few years. Nevertheless, most existing algorithms merely take the consensus information among different views into consideration for clustering. Actually, it may hinder the multi-view clustering performance in real-life applications, since different views usually contain diverse statistic properties. To address this problem, we propose a novel Tensor-based Intrinsic Subspace Representation Learning (TISRL) for multi-view clustering in this paper. Concretely, the rank preserving decomposition is proposed firstly to effectively deal with the diverse statistic information contained in different views. Then, to achieve the intrinsic subspace representation, the tensor-singular value decomposition based low-rank tensor constraint is also utilized in our method. It can be seen that specific information contained in different views is fully investigated by the rank preserving decomposition, and the high-order correlations of multi-view data are also mined by the low-rank tensor constraint. The objective function can be optimized by an augmented Lagrangian multiplier based alternating direction minimization algorithm. Experimental results on nine common used real-world multi-view datasets illustrate the superiority of TISRL

arXiv.org e-Print Archive

Learning a Deep Color Difference Metric for Photographic Images

Author: Chen Haoyu
Ma Kede
Sun Qilin
Wang Zhihua
Yang Yang
Publication venue
Publication date: 27/03/2023
Field of study

Most well-established and widely used color difference (CD) metrics are handcrafted and subject-calibrated against uniformly colored patches, which do not generalize well to photographic images characterized by natural scene complexities. Constructing CD formulae for photographic images is still an active research topic in imaging/illumination, vision science, and color science communities. In this paper, we aim to learn a deep CD metric for photographic images with four desirable properties. First, it well aligns with the observations in vision science that color and form are linked inextricably in visual cortical processing. Second, it is a proper metric in the mathematical sense. Third, it computes accurate CDs between photographic images, differing mainly in color appearances. Fourth, it is robust to mild geometric distortions (e.g., translation or due to parallax), which are often present in photographic images of the same scene captured by different digital cameras. We show that all these properties can be satisfied at once by learning a multi-scale autoregressive normalizing flow for feature transform, followed by the Euclidean distance which is linearly proportional to the human perceptual CD. Quantitative and qualitative experiments on the large-scale SPCD dataset demonstrate the promise of the learned CD metric

arXiv.org e-Print Archive

Light Field Diffusion for Single-View Novel View Synthesis

Author: Han Kun
Ma Haoyu
Sun Shanlin
Xie Xiaohui
Xiong Yifeng
Publication venue
Publication date: 22/09/2023
Field of study

Single-view novel view synthesis, the task of generating images from new viewpoints based on a single reference image, is an important but challenging task in computer vision. Recently, Denoising Diffusion Probabilistic Model (DDPM) has become popular in this area due to its strong ability to generate high-fidelity images. However, current diffusion-based methods directly rely on camera pose matrices as viewing conditions, globally and implicitly introducing 3D constraints. These methods may suffer from inconsistency among generated images from different perspectives, especially in regions with intricate textures and structures. In this work, we present Light Field Diffusion (LFD), a conditional diffusion-based model for single-view novel view synthesis. Unlike previous methods that employ camera pose matrices, LFD transforms the camera view information into light field encoding and combines it with the reference image. This design introduces local pixel-wise constraints within the diffusion models, thereby encouraging better multi-view consistency. Experiments on several datasets show that our LFD can efficiently generate high-fidelity images and maintain better 3D consistency even in intricate regions. Our method can generate images with higher quality than NeRF-based models, and we obtain sample quality similar to other diffusion-based models but with only one-third of the model size

arXiv.org e-Print Archive