Search CORE

133 research outputs found

VFAS-Grasp: Closed Loop Grasping with Visual Feedback and Adaptive Sampling

Author: Huh Jinwook
Isler Volkan
Piacenza Pedro
Yuan Jiacheng
Publication venue
Publication date: 27/10/2023
Field of study

We consider the problem of closed-loop robotic grasping and present a novel planner which uses Visual Feedback and an uncertainty-aware Adaptive Sampling strategy (VFAS) to close the loop. At each iteration, our method VFAS-Grasp builds a set of candidate grasps by generating random perturbations of a seed grasp. The candidates are then scored using a novel metric which combines a learned grasp-quality estimator, the uncertainty in the estimate and the distance from the seed proposal to promote temporal consistency. Additionally, we present two mechanisms to improve the efficiency of our sampling strategy: We dynamically scale the sampling region size and number of samples in it based on past grasp scores. We also leverage a motion vector field estimator to shift the center of our sampling region. We demonstrate that our algorithm can run in real time (20 Hz) and is capable of improving grasp performance for static scenes by refining the initial grasp proposal. We also show that it can enable grasping of slow moving objects, such as those encountered during human to robot handover

arXiv.org e-Print Archive

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

Author: Chen Jiacheng
Chen Kai
Lin Dahua
Liu Yuan
Zhang Songyang
Publication venue
Publication date: 24/03/2023
Field of study

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT. However, subsequent works have complicated the framework with new auxiliary tasks or extra pre-trained models, inevitably increasing computational overhead. This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction, which examines the input image patches and reconstruction target, and highlights two critical but previously overlooked bottlenecks. Based on this analysis, we propose a remarkably simple and effective method, {\ourmethod}, that entails two strategies: 1) filtering the high-frequency components from the reconstruction target to de-emphasize the network's focus on texture-rich details and 2) adopting a conservative data transform strategy to alleviate the problem of missing foreground in MIM training. {\ourmethod} can be easily integrated into most existing pixel-based MIM approaches (\ie, using raw images as reconstruction target) with negligible additional computation. Without bells and whistles, our method consistently improves three MIM approaches, MAE, ConvMAE, and LSMAE, across various downstream tasks. We believe this effective plug-and-play method will serve as a strong baseline for self-supervised learning and provide insights for future improvements of the MIM framework. Code and models are available at \url{https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/pixmim}.Comment: Update code link and add additional result

arXiv.org e-Print Archive

HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image

Author: Chavan-Dafle Nikhil
Choi Hongsuk
Isler Volkan
Park Hyunsoo
Yuan Jiacheng
Publication venue
Publication date: 14/09/2023
Field of study

This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to constrain the possible relative configuration of the hand and object geometry. We design a generalizable implicit function, HandNeRF, that explicitly encodes the correlation of the 3D hand shape features and 2D object features to predict the hand and object scene geometry. With experiments on real-world datasets, we show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods. Moreover, we demonstrate that object reconstruction from HandNeRF ensures more accurate execution of a downstream task, such as grasping for robotic hand-over.Comment: 9 pages, 4 tables, 7 figure

arXiv.org e-Print Archive

Touch and Go: Learning from Human-Collected Vision and Touch

Author: Ma Chenyang
Owens Andrew
Yang Fengyu
Yuan Wenzhen
Zhang Jiacheng
Zhu Jing
Publication venue
Publication date: 29/11/2022
Field of study

The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world. We propose a dataset with paired visual and tactile data called Touch and Go, in which human data collectors probe objects in natural environments using tactile sensors, while simultaneously recording egocentric video. In contrast to previous efforts, which have largely been confined to lab settings or simulated environments, our dataset spans a large number of "in the wild" objects and scenes. To demonstrate our dataset's effectiveness, we successfully apply it to a variety of tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven image stylization, i.e., making the visual appearance of an object more consistent with a given tactile signal, and 3) predicting future frames of a tactile signal from visuo-tactile inputs.Comment: Accepted by NeurIPS 2022 Track of Datasets and Benchmark

arXiv.org e-Print Archive

RF-Transformer: A Unified Backscatter Radio Hardware Abstraction

Author: Guo Xiuzhen
He Yuan
Liu Yunhao
Shangguan Longfei
Yu Zihao
Zhang Jiacheng
Publication venue
Publication date: 29/09/2022
Field of study

This paper presents RF-Transformer, a unified backscatter radio hardware abstraction that allows a low-power IoT device to directly communicate with heterogeneous wireless receivers at the minimum power consumption. Unlike existing backscatter systems that are tailored to a specific wireless communication protocol, RF-Transformer provides a programmable interface to the micro-controller, allowing IoT devices to synthesize different types of protocol-compliant backscatter signals sharing radically different PHY-layer designs. To show the efficacy of our design, we implement a PCB prototype of RF-Transformer on 2.4 GHz ISM band and showcase its capability on generating standard ZigBee, Bluetooth, LoRa, and Wi-Fi 802.11b/g/n/ac packets. Our extensive field studies show that RF-Transformer achieves 23.8 Mbps, 247.1 Kbps, 986.5 Kbps, and 27.3 Kbps throughput when generating standard Wi-Fi, ZigBee, Bluetooth, and LoRa signals while consuming 7.6-74.2 less power than their active counterparts. Our ASIC simulation based on the 65-nm CMOS process shows that the power gain of RF-Transformer can further grow to 92-678. We further integrate RF-Transformer with pressure sensors and present a case study on detecting foot traffic density in hallways. Our 7-day case studies demonstrate RFTransformer can reliably transmit sensor data to a commodity gateway by synthesizing LoRa packets on top of Wi-Fi signals. Our experimental results also verify the compatibility of RF-Transformer with commodity receivers. Code and hardware schematics can be found at: https://github.com/LeFsCC/RF-Transformer

arXiv.org e-Print Archive

Robust Unstructured Knowledge Access in Conversational Dialogue with ASR Errors

Author: Liao Tinglong
Tam Yik-Cheung
Wang Zecheng
Xu Jiacheng
Yuan Shuhan
Zou Jiakai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/11/2022
Field of study

Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion networks generated from an ASR decoder without human transcriptions to generate a variety of error patterns for model training. We evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded task-oriented conversational dialogues with ASR errors. Experimental results show the effectiveness of our proposed approach, boosting the knowledge-seeking turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster classification is boosted from 0.7924 to 0.9333 in Recall@1. After knowledge document re-ranking, our approach shows significant improvement in all knowledge selection metrics, from 0.7358 to 0.7806 in Recall@1, from 0.8301 to 0.9333 in Recall@5, and from 0.7798 to 0.8460 in MRR@5 on the test set. In the recent DSTC10 evaluation, our approach demonstrates significant improvement in knowledge selection, boosting Recall@1 from 0.495 to 0.7144 compared to the official baseline. Our source code is released in GitHub https://github.com/yctam/dstc10_track2_task2.git.Comment: 7 pages, 2 figures. Accepted at ICASSP 202

arXiv.org e-Print Archive

Saiyan: Design and Implementation of a Low-power Demodulator for LoRa Backscatter Systems

Author: Guo Xiuzhen
He Yuan
Jiang Haotian
Jing Nan
Liu Yunhao
Shangguan Longfei
Zhang Jiacheng
Publication venue
Publication date: 30/09/2022
Field of study

The radio range of backscatter systems continues growing as new wireless communication primitives are continuously invented. Nevertheless, both the bit error rate and the packet loss rate of backscatter signals increase rapidly with the radio range, thereby necessitating the cooperation between the access point and the backscatter tags through a feedback loop. Unfortunately, the low-power nature of backscatter tags limits their ability to demodulate feedback signals from a remote access point and scales down to such circumstances. This paper presents Saiyan, an ultra-low-power demodulator for long-range LoRa backscatter systems. With Saiyan, a backscatter tag can demodulate feedback signals from a remote access point with moderate power consumption and then perform an immediate packet retransmission in the presence of packet loss. Moreover, Saiyan enables rate adaption and channel hopping-two PHY-layer operations that are important to channel efficiency yet unavailable on long-range backscatter systems. We prototype Saiyan on a two-layer PCB board and evaluate its performance in different environments. Results show that Saiyan achieves 5 gain on the demodulation range, compared with state-of-the-art systems. Our ASIC simulation shows that the power consumption of Saiyan is around 93.2 uW. Code and hardware schematics can be found at: https://github.com/ZangJac/Saiyan

arXiv.org e-Print Archive

Recommended from our members

Vertical migration from surface soils to groundwater and source appointment of polycyclic aromatic hydrocarbons in epikarst spring systems, southwest China

Author: Lan Jiacheng
Pu Junbing
Sun Yuchuan
Xie Zhenglan
Xing Baoshan
Yang Hong
Yuan Daoxian
Zhang Siyu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Understanding the transfer process of polycyclic aromatic hydrocarbons (PAHs) in the karst terrain is of great importance to their ecological risk assessments, however, the impact of the vertical transfer of the soil PAHs on the underground water is largely unknown in the karst system. Here, the vertical distribution and the seasonal variation of 16 PAHs in the soils and the water of 4 epikarst spring catchments in Southwest China were investigated. The total concentration of the PAHs ranged within 61-3285 ng g in the soils, and 341-4969 ng L in the spring water. The vertical distribution of the PAHs in soils varied with ring numbers and altitude of the catchment. PAHs concentrations were linearly related with the total organic carbon (TOC) at different depths in the catchments 563-783 m above the sea level (A.S.L.). However, no correlation with TOC was observed in the catchment of a high altitude (2090 m A.S.L.), because the large water flux led to the fast migration of the 2-3 rings PAHs in soils. The PAHs in soils and springs were mainly derived from the combustion of grass/wood/coal, closely related with the primary fossil fuels used in this area. This study demonstrate that the groundwater was heavily polluted by PAHs in the karst terrains of Southwest China, due to the vertical transfer of PAHs from the surface soils, and effective protection was urgently needed

Central Archive at the University of Reading

SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection

Author: Fu Haolong
He Xuan
Li Zhiyong
Lin Jiacheng
Wang Meng
Yang Fan
Yang Kailun
Yuan Jin
Publication venue
Publication date: 02/06/2023
Field of study

Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which aims at predicting 3D attributes from a single 2D image. Most existing transformer-based methods leverage both visual and depth representations to explore valuable query points on objects, and the quality of the learned query points has a great impact on detection accuracy. Unfortunately, existing unsupervised attention mechanisms in transformers are prone to generate low-quality query features due to inaccurate receptive fields, especially on hard objects. To tackle this problem, this paper proposes a novel Supervised Scale-aware Deformable Attention (SSDA) for monocular 3D object detection. Specifically, SSDA presets several masks with different scales and utilizes depth and visual features to adaptively learn a scale-aware filter for object query augmentation. Imposing the scale awareness, SSDA could well predict the accurate receptive field of an object query to support robust query feature generation. Aside from this, SSDA is assigned with a Weighted Scale Matching (WSM) loss to supervise scale prediction, which presents more confident results as compared to the unsupervised attention mechanisms. Extensive experiments on the KITTI benchmark demonstrate that SSDA significantly improves the detection accuracy, especially on moderate and hard objects, yielding state-of-the-art performance as compared to the existing approaches. Our code will be made publicly available at https://github.com/mikasa3lili/SSD-MonoDETR.Comment: Code will be made publicly available at https://github.com/mikasa3lili/SSD-MonoDET

arXiv.org e-Print Archive