178 research outputs found
Towards Free Data Selection with General-Purpose Models
A desirable data selection algorithm can efficiently choose the most
informative samples to maximize the utility of limited annotation budgets.
However, current approaches, represented by active learning methods, typically
follow a cumbersome pipeline that iterates the time-consuming model training
and batch data selection repeatedly. In this paper, we challenge this status
quo by designing a distinct data selection pipeline that utilizes existing
general-purpose models to select data from various datasets with a single-pass
inference without the need for additional training or supervision. A novel free
data selection (FreeSel) method is proposed following this new pipeline.
Specifically, we define semantic patterns extracted from inter-mediate features
of the general-purpose model to capture subtle local information in each image.
We then enable the selection of all data samples in a single pass through
distance-based sampling at the fine-grained semantic pattern level. FreeSel
bypasses the heavy batch selection process, achieving a significant improvement
in efficiency and being 530x faster than existing active learning methods.
Extensive experiments verify the effectiveness of FreeSel on various computer
vision tasks. Our code is available at https://github.com/yichen928/FreeSel.Comment: accepted by NeurIPS 202
Hedgehog Spin-vortex Crystal Antiferromagnetic Quantum Criticality in CaK(Fe1-xNix)4As4 Revealed by NMR
Two ordering states, antiferromagnetism and nematicity, have been observed in
most iron-based superconductors (SCs). In contrast to those SCs, the newly
discovered SC CaK(FeNi)As exhibits an antiferromagnetic
(AFM) state, called hedgehog spin-vortex crystal structure, without nematic
order, providing the opportunity for the investigation into the relationship
between spin fluctuations and SC without any effects of nematic fluctuations.
Our As nuclear magnetic resonance studies on
CaK(FeNi)As (0 0.049) revealed that
CaKFeAs is located close to a hidden hedgehog SVC AFM quantum-critical
point (QCP). The magnetic QCP without nematicity in
CaK(FeNi)As highlights the close connection of spin
fluctuations and superconductivity in iron-based SCs. The advantage of
stoichiometric composition also makes CaKFeAs an ideal platform for
further detailed investigation of the relationship between magnetic QCP and
superconductivity in iron-based SCs without disorder effects.Comment: 6 pages, 5 figures, accepted for publication in Phys. Rev. Let
LGDN: Language-Guided Denoising Network for Video-Language Modeling
Video-language modeling has attracted much attention with the rapid growth of
web videos. Most existing methods assume that the video frames and text
description are semantically correlated, and focus on video-language modeling
at video level. However, this hypothesis often fails for two reasons: (1) With
the rich semantics of video contents, it is difficult to cover all frames with
a single video-level description; (2) A raw video typically has
noisy/meaningless information (e.g., scenery shot, transition or teaser).
Although a number of recent works deploy attention mechanism to alleviate this
problem, the irrelevant/noisy information still makes it very difficult to
address. To overcome such challenge, we thus propose an efficient and effective
model, termed Language-Guided Denoising Network (LGDN), for video-language
modeling. Different from most existing methods that utilize all extracted video
frames, LGDN dynamically filters out the misaligned or redundant frames under
the language supervision and obtains only 2--4 salient frames per video for
cross-modal token-level alignment. Extensive experiments on five public
datasets show that our LGDN outperforms the state-of-the-arts by large margins.
We also provide detailed ablation study to reveal the critical importance of
solving the noise issue, in hope of inspiring future video-language work.Comment: Accepted by NeurIPS202
EC^2: Emergent Communication for Embodied Control
Embodied control requires agents to leverage multi-modal pre-training to
quickly learn how to act in new environments, where video demonstrations
contain visual and motion details needed for low-level perception and control,
and language instructions support generalization with abstract, symbolic
structures. While recent approaches apply contrastive learning to force
alignment between the two modalities, we hypothesize better modeling their
complementary differences can lead to more holistic representations for
downstream adaption. To this end, we propose Emergent Communication for
Embodied Control (EC^2), a novel scheme to pre-train video-language
representations for few-shot embodied control. The key idea is to learn an
unsupervised "language" of videos via emergent communication, which bridges the
semantics of video details and structures of natural language. We learn
embodied representations of video trajectories, emergent language, and natural
language using a language model, which is then used to finetune a lightweight
policy network for downstream control. Through extensive experiments in
Metaworld and Franka Kitchen embodied benchmarks, EC^2 is shown to consistently
outperform previous contrastive learning methods for both videos and texts as
task inputs. Further ablations confirm the importance of the emergent language,
which is beneficial for both video and language learning, and significantly
superior to using pre-trained video captions. We also present a quantitative
and qualitative analysis of the emergent language and discuss future directions
toward better understanding and leveraging emergent communication in embodied
tasks.Comment: Published in CVPR202
Distributed Fixed-Time Control for Leader-Steered Rigid Shape Formation with Prescribed Performance
Resorting to the principle of rigid body kinematics, a novel framework for a multi-robot network is proposed to form and maintain an invariant rigid geometric shape. Unlike consensus-based formation, this approach can perform both translational and rotational movements of the formation geometry, ensuring that the entire formation motion remains consistent with the leader. To achieve the target formation shape and motion, a distributed control protocol for multiple Euler-Lagrange robotic vehicles subject to nonholonomic constraints is developed. The proposed protocol includes a novel prescribed performance control (PPC) algorithm that addresses the second-order dynamics of the robotic vehicles by employing a combination of nonsingular sliding manifold and adaptive law. Finally, the effectiveness of the proposed formation framework and control protocol is demonstrated through the numerical simulations and practical experiments with a team of four robotic vehicles
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow
A major challenge for video semantic segmentation is the lack of labeled
data. In most benchmark datasets, only one frame of a video clip is annotated,
which makes most supervised methods fail to utilize information from the rest
of the frames. To exploit the spatio-temporal information in videos, many
previous works use pre-computed optical flows, which encode the temporal
consistency to improve the video segmentation. However, the video segmentation
and optical flow estimation are still considered as two separate tasks. In this
paper, we propose a novel framework for joint video semantic segmentation and
optical flow estimation. Semantic segmentation brings semantic information to
handle occlusion for more robust optical flow estimation, while the
non-occluded optical flow provides accurate pixel-level temporal
correspondences to guarantee the temporal consistency of the segmentation.
Moreover, our framework is able to utilize both labeled and unlabeled frames in
the video through joint training, while no additional calculation is required
in inference. Extensive experiments show that the proposed model makes the
video semantic segmentation and optical flow estimation benefit from each other
and outperforms existing methods under the same settings in both tasks.Comment: Published in AAAI 202
Adaptive Sliding Mode Fault Tolerant Control for Autonomous Vehicle With Unknown Actuator Parameters and Saturated Tire Force Based on the Center of Percussion
With consideration of tire force saturation in vehicle motions, a novel path-following controller is developed for autonomous vehicles with unknown-bound disturbances and unknown actuator parameters. An adaptive sliding-mode fault-tolerant control (ASM-FTC) strategy is designed to stabilize the path-following errors without any information of disturbance boundaries, actuator fault boundaries and steering ratio from the steering wheel to the front wheels. By selecting the distance from the center of gravity to the center of percussion as the preview length, the effects of the lateral rear-tire force are decoupled and cancelled out, and then the preview error, which represents the path-following performance, can be only commanded by the front-tire force. To further address the issue of unknown tire-road friction limits, a modified ASM-FTC strategy is presented to improve the path-following performance as the lateral tire force is saturated. Simulation results show that the modified ASM-FTC controller demonstrates superior tracking performance over the normal ASM-FTC while the autonomous vehicle follows desired paths
Doubly Robust Self-Training
Self-training is an important technique for solving semi-supervised learning
problems. It leverages unlabeled data by generating pseudo-labels and combining
them with a limited labeled dataset for training. The effectiveness of
self-training heavily relies on the accuracy of these pseudo-labels. In this
paper, we introduce doubly robust self-training, a novel semi-supervised
algorithm that provably balances between two extremes. When the pseudo-labels
are entirely incorrect, our method reduces to a training process solely using
labeled data. Conversely, when the pseudo-labels are completely accurate, our
method transforms into a training process utilizing all pseudo-labeled data and
labeled data, thus increasing the effective sample size. Through empirical
evaluations on both the ImageNet dataset for image classification and the
nuScenes autonomous driving dataset for 3D object detection, we demonstrate the
superiority of the doubly robust loss over the standard self-training baseline
- …