132 research outputs found
VFAS-Grasp: Closed Loop Grasping with Visual Feedback and Adaptive Sampling
We consider the problem of closed-loop robotic grasping and present a novel
planner which uses Visual Feedback and an uncertainty-aware Adaptive Sampling
strategy (VFAS) to close the loop. At each iteration, our method VFAS-Grasp
builds a set of candidate grasps by generating random perturbations of a seed
grasp. The candidates are then scored using a novel metric which combines a
learned grasp-quality estimator, the uncertainty in the estimate and the
distance from the seed proposal to promote temporal consistency. Additionally,
we present two mechanisms to improve the efficiency of our sampling strategy:
We dynamically scale the sampling region size and number of samples in it based
on past grasp scores. We also leverage a motion vector field estimator to shift
the center of our sampling region. We demonstrate that our algorithm can run in
real time (20 Hz) and is capable of improving grasp performance for static
scenes by refining the initial grasp proposal. We also show that it can enable
grasping of slow moving objects, such as those encountered during human to
robot handover
PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
Masked Image Modeling (MIM) has achieved promising progress with the advent
of Masked Autoencoders (MAE) and BEiT. However, subsequent works have
complicated the framework with new auxiliary tasks or extra pre-trained models,
inevitably increasing computational overhead. This paper undertakes a
fundamental analysis of MIM from the perspective of pixel reconstruction, which
examines the input image patches and reconstruction target, and highlights two
critical but previously overlooked bottlenecks. Based on this analysis, we
propose a remarkably simple and effective method, {\ourmethod}, that entails
two strategies: 1) filtering the high-frequency components from the
reconstruction target to de-emphasize the network's focus on texture-rich
details and 2) adopting a conservative data transform strategy to alleviate the
problem of missing foreground in MIM training. {\ourmethod} can be easily
integrated into most existing pixel-based MIM approaches (\ie, using raw images
as reconstruction target) with negligible additional computation. Without bells
and whistles, our method consistently improves three MIM approaches, MAE,
ConvMAE, and LSMAE, across various downstream tasks. We believe this effective
plug-and-play method will serve as a strong baseline for self-supervised
learning and provide insights for future improvements of the MIM framework.
Code and models are available at
\url{https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/pixmim}.Comment: Update code link and add additional result
HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image
This paper presents a method to learn hand-object interaction prior for
reconstructing a 3D hand-object scene from a single RGB image. The inference as
well as training-data generation for 3D hand-object scene reconstruction is
challenging due to the depth ambiguity of a single image and occlusions by the
hand and object. We turn this challenge into an opportunity by utilizing the
hand shape to constrain the possible relative configuration of the hand and
object geometry. We design a generalizable implicit function, HandNeRF, that
explicitly encodes the correlation of the 3D hand shape features and 2D object
features to predict the hand and object scene geometry. With experiments on
real-world datasets, we show that HandNeRF is able to reconstruct hand-object
scenes of novel grasp configurations more accurately than comparable methods.
Moreover, we demonstrate that object reconstruction from HandNeRF ensures more
accurate execution of a downstream task, such as grasping for robotic
hand-over.Comment: 9 pages, 4 tables, 7 figure
Touch and Go: Learning from Human-Collected Vision and Touch
The ability to associate touch with sight is essential for tasks that require
physically interacting with objects in the world. We propose a dataset with
paired visual and tactile data called Touch and Go, in which human data
collectors probe objects in natural environments using tactile sensors, while
simultaneously recording egocentric video. In contrast to previous efforts,
which have largely been confined to lab settings or simulated environments, our
dataset spans a large number of "in the wild" objects and scenes. To
demonstrate our dataset's effectiveness, we successfully apply it to a variety
of tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven
image stylization, i.e., making the visual appearance of an object more
consistent with a given tactile signal, and 3) predicting future frames of a
tactile signal from visuo-tactile inputs.Comment: Accepted by NeurIPS 2022 Track of Datasets and Benchmark
RF-Transformer: A Unified Backscatter Radio Hardware Abstraction
This paper presents RF-Transformer, a unified backscatter radio hardware
abstraction that allows a low-power IoT device to directly communicate with
heterogeneous wireless receivers at the minimum power consumption. Unlike
existing backscatter systems that are tailored to a specific wireless
communication protocol, RF-Transformer provides a programmable interface to the
micro-controller, allowing IoT devices to synthesize different types of
protocol-compliant backscatter signals sharing radically different PHY-layer
designs. To show the efficacy of our design, we implement a PCB prototype of
RF-Transformer on 2.4 GHz ISM band and showcase its capability on generating
standard ZigBee, Bluetooth, LoRa, and Wi-Fi 802.11b/g/n/ac packets. Our
extensive field studies show that RF-Transformer achieves 23.8 Mbps, 247.1
Kbps, 986.5 Kbps, and 27.3 Kbps throughput when generating standard Wi-Fi,
ZigBee, Bluetooth, and LoRa signals while consuming 7.6-74.2 less power than
their active counterparts. Our ASIC simulation based on the 65-nm CMOS process
shows that the power gain of RF-Transformer can further grow to 92-678. We
further integrate RF-Transformer with pressure sensors and present a case study
on detecting foot traffic density in hallways. Our 7-day case studies
demonstrate RFTransformer can reliably transmit sensor data to a commodity
gateway by synthesizing LoRa packets on top of Wi-Fi signals. Our experimental
results also verify the compatibility of RF-Transformer with commodity
receivers. Code and hardware schematics can be found at:
https://github.com/LeFsCC/RF-Transformer
Robust Unstructured Knowledge Access in Conversational Dialogue with ASR Errors
Performance of spoken language understanding (SLU) can be degraded with
automatic speech recognition (ASR) errors. We propose a novel approach to
improve SLU robustness by randomly corrupting clean training text with an ASR
error simulator, followed by self-correcting the errors and minimizing the
target classification loss in a joint manner. In the proposed error simulator,
we leverage confusion networks generated from an ASR decoder without human
transcriptions to generate a variety of error patterns for model training. We
evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded
task-oriented conversational dialogues with ASR errors. Experimental results
show the effectiveness of our proposed approach, boosting the knowledge-seeking
turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster
classification is boosted from 0.7924 to 0.9333 in Recall@1. After knowledge
document re-ranking, our approach shows significant improvement in all
knowledge selection metrics, from 0.7358 to 0.7806 in Recall@1, from 0.8301 to
0.9333 in Recall@5, and from 0.7798 to 0.8460 in MRR@5 on the test set. In the
recent DSTC10 evaluation, our approach demonstrates significant improvement in
knowledge selection, boosting Recall@1 from 0.495 to 0.7144 compared to the
official baseline. Our source code is released in GitHub
https://github.com/yctam/dstc10_track2_task2.git.Comment: 7 pages, 2 figures. Accepted at ICASSP 202
Saiyan: Design and Implementation of a Low-power Demodulator for LoRa Backscatter Systems
The radio range of backscatter systems continues growing as new wireless
communication primitives are continuously invented. Nevertheless, both the bit
error rate and the packet loss rate of backscatter signals increase rapidly
with the radio range, thereby necessitating the cooperation between the access
point and the backscatter tags through a feedback loop. Unfortunately, the
low-power nature of backscatter tags limits their ability to demodulate
feedback signals from a remote access point and scales down to such
circumstances. This paper presents Saiyan, an ultra-low-power demodulator for
long-range LoRa backscatter systems. With Saiyan, a backscatter tag can
demodulate feedback signals from a remote access point with moderate power
consumption and then perform an immediate packet retransmission in the presence
of packet loss. Moreover, Saiyan enables rate adaption and channel hopping-two
PHY-layer operations that are important to channel efficiency yet unavailable
on long-range backscatter systems. We prototype Saiyan on a two-layer PCB board
and evaluate its performance in different environments. Results show that
Saiyan achieves 5 gain on the demodulation range, compared with
state-of-the-art systems. Our ASIC simulation shows that the power consumption
of Saiyan is around 93.2 uW. Code and hardware schematics can be found at:
https://github.com/ZangJac/Saiyan
Recommended from our members
Vertical migration from surface soils to groundwater and source appointment of polycyclic aromatic hydrocarbons in epikarst spring systems, southwest China
Understanding the transfer process of polycyclic aromatic hydrocarbons (PAHs) in the karst terrain is of great importance to their ecological risk assessments, however, the impact of the vertical transfer of the soil PAHs on the underground water is largely unknown in the karst system. Here, the vertical distribution and the seasonal variation of 16 PAHs in the soils and the water of 4 epikarst spring catchments in Southwest China were investigated. The total concentration of the PAHs ranged within 61-3285 ng g in the soils, and 341-4969 ng L in the spring water. The vertical distribution of the PAHs in soils varied with ring numbers and altitude of the catchment. PAHs concentrations were linearly related with the total organic carbon (TOC) at different depths in the catchments 563-783 m above the sea level (A.S.L.). However, no correlation with TOC was observed in the catchment of a high altitude (2090 m A.S.L.), because the large water flux led to the fast migration of the 2-3 rings PAHs in soils. The PAHs in soils and springs were mainly derived from the combustion of grass/wood/coal, closely related with the primary fossil fuels used in this area. This study demonstrate that the groundwater was heavily polluted by PAHs in the karst terrains of Southwest China, due to the vertical transfer of PAHs from the surface soils, and effective protection was urgently needed
SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection
Transformer-based methods have demonstrated superior performance for
monocular 3D object detection recently, which aims at predicting 3D attributes
from a single 2D image. Most existing transformer-based methods leverage both
visual and depth representations to explore valuable query points on objects,
and the quality of the learned query points has a great impact on detection
accuracy. Unfortunately, existing unsupervised attention mechanisms in
transformers are prone to generate low-quality query features due to inaccurate
receptive fields, especially on hard objects. To tackle this problem, this
paper proposes a novel Supervised Scale-aware Deformable Attention (SSDA) for
monocular 3D object detection. Specifically, SSDA presets several masks with
different scales and utilizes depth and visual features to adaptively learn a
scale-aware filter for object query augmentation. Imposing the scale awareness,
SSDA could well predict the accurate receptive field of an object query to
support robust query feature generation. Aside from this, SSDA is assigned with
a Weighted Scale Matching (WSM) loss to supervise scale prediction, which
presents more confident results as compared to the unsupervised attention
mechanisms. Extensive experiments on the KITTI benchmark demonstrate that SSDA
significantly improves the detection accuracy, especially on moderate and hard
objects, yielding state-of-the-art performance as compared to the existing
approaches. Our code will be made publicly available at
https://github.com/mikasa3lili/SSD-MonoDETR.Comment: Code will be made publicly available at
https://github.com/mikasa3lili/SSD-MonoDET
- …