178 research outputs found

    Towards Free Data Selection with General-Purpose Models

    Full text link
    A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. However, current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. In this paper, we challenge this status quo by designing a distinct data selection pipeline that utilizes existing general-purpose models to select data from various datasets with a single-pass inference without the need for additional training or supervision. A novel free data selection (FreeSel) method is proposed following this new pipeline. Specifically, we define semantic patterns extracted from inter-mediate features of the general-purpose model to capture subtle local information in each image. We then enable the selection of all data samples in a single pass through distance-based sampling at the fine-grained semantic pattern level. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods. Extensive experiments verify the effectiveness of FreeSel on various computer vision tasks. Our code is available at https://github.com/yichen928/FreeSel.Comment: accepted by NeurIPS 202

    Hedgehog Spin-vortex Crystal Antiferromagnetic Quantum Criticality in CaK(Fe1-xNix)4As4 Revealed by NMR

    Get PDF
    Two ordering states, antiferromagnetism and nematicity, have been observed in most iron-based superconductors (SCs). In contrast to those SCs, the newly discovered SC CaK(Fe1x_{1-x}Nix_x)4_4As4_4 exhibits an antiferromagnetic (AFM) state, called hedgehog spin-vortex crystal structure, without nematic order, providing the opportunity for the investigation into the relationship between spin fluctuations and SC without any effects of nematic fluctuations. Our 75^{75}As nuclear magnetic resonance studies on CaK(Fe1x_{1-x}Nix_x)4_4As4_4 (0x\le x\le 0.049) revealed that CaKFe4_4As4_4 is located close to a hidden hedgehog SVC AFM quantum-critical point (QCP). The magnetic QCP without nematicity in CaK(Fe1x_{1-x}Nix_x)4_4As4_4 highlights the close connection of spin fluctuations and superconductivity in iron-based SCs. The advantage of stoichiometric composition also makes CaKFe4_4As4_4 an ideal platform for further detailed investigation of the relationship between magnetic QCP and superconductivity in iron-based SCs without disorder effects.Comment: 6 pages, 5 figures, accepted for publication in Phys. Rev. Let

    LGDN: Language-Guided Denoising Network for Video-Language Modeling

    Full text link
    Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e.g., scenery shot, transition or teaser). Although a number of recent works deploy attention mechanism to alleviate this problem, the irrelevant/noisy information still makes it very difficult to address. To overcome such challenge, we thus propose an efficient and effective model, termed Language-Guided Denoising Network (LGDN), for video-language modeling. Different from most existing methods that utilize all extracted video frames, LGDN dynamically filters out the misaligned or redundant frames under the language supervision and obtains only 2--4 salient frames per video for cross-modal token-level alignment. Extensive experiments on five public datasets show that our LGDN outperforms the state-of-the-arts by large margins. We also provide detailed ablation study to reveal the critical importance of solving the noise issue, in hope of inspiring future video-language work.Comment: Accepted by NeurIPS202

    EC^2: Emergent Communication for Embodied Control

    Full text link
    Embodied control requires agents to leverage multi-modal pre-training to quickly learn how to act in new environments, where video demonstrations contain visual and motion details needed for low-level perception and control, and language instructions support generalization with abstract, symbolic structures. While recent approaches apply contrastive learning to force alignment between the two modalities, we hypothesize better modeling their complementary differences can lead to more holistic representations for downstream adaption. To this end, we propose Emergent Communication for Embodied Control (EC^2), a novel scheme to pre-train video-language representations for few-shot embodied control. The key idea is to learn an unsupervised "language" of videos via emergent communication, which bridges the semantics of video details and structures of natural language. We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control. Through extensive experiments in Metaworld and Franka Kitchen embodied benchmarks, EC^2 is shown to consistently outperform previous contrastive learning methods for both videos and texts as task inputs. Further ablations confirm the importance of the emergent language, which is beneficial for both video and language learning, and significantly superior to using pre-trained video captions. We also present a quantitative and qualitative analysis of the emergent language and discuss future directions toward better understanding and leveraging emergent communication in embodied tasks.Comment: Published in CVPR202

    Distributed Fixed-Time Control for Leader-Steered Rigid Shape Formation with Prescribed Performance

    Get PDF
    Resorting to the principle of rigid body kinematics, a novel framework for a multi-robot network is proposed to form and maintain an invariant rigid geometric shape. Unlike consensus-based formation, this approach can perform both translational and rotational movements of the formation geometry, ensuring that the entire formation motion remains consistent with the leader. To achieve the target formation shape and motion, a distributed control protocol for multiple Euler-Lagrange robotic vehicles subject to nonholonomic constraints is developed. The proposed protocol includes a novel prescribed performance control (PPC) algorithm that addresses the second-order dynamics of the robotic vehicles by employing a combination of nonsingular sliding manifold and adaptive law. Finally, the effectiveness of the proposed formation framework and control protocol is demonstrated through the numerical simulations and practical experiments with a team of four robotic vehicles

    Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

    Full text link
    A major challenge for video semantic segmentation is the lack of labeled data. In most benchmark datasets, only one frame of a video clip is annotated, which makes most supervised methods fail to utilize information from the rest of the frames. To exploit the spatio-temporal information in videos, many previous works use pre-computed optical flows, which encode the temporal consistency to improve the video segmentation. However, the video segmentation and optical flow estimation are still considered as two separate tasks. In this paper, we propose a novel framework for joint video semantic segmentation and optical flow estimation. Semantic segmentation brings semantic information to handle occlusion for more robust optical flow estimation, while the non-occluded optical flow provides accurate pixel-level temporal correspondences to guarantee the temporal consistency of the segmentation. Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference. Extensive experiments show that the proposed model makes the video semantic segmentation and optical flow estimation benefit from each other and outperforms existing methods under the same settings in both tasks.Comment: Published in AAAI 202

    Adaptive Sliding Mode Fault Tolerant Control for Autonomous Vehicle With Unknown Actuator Parameters and Saturated Tire Force Based on the Center of Percussion

    Get PDF
    With consideration of tire force saturation in vehicle motions, a novel path-following controller is developed for autonomous vehicles with unknown-bound disturbances and unknown actuator parameters. An adaptive sliding-mode fault-tolerant control (ASM-FTC) strategy is designed to stabilize the path-following errors without any information of disturbance boundaries, actuator fault boundaries and steering ratio from the steering wheel to the front wheels. By selecting the distance from the center of gravity to the center of percussion as the preview length, the effects of the lateral rear-tire force are decoupled and cancelled out, and then the preview error, which represents the path-following performance, can be only commanded by the front-tire force. To further address the issue of unknown tire-road friction limits, a modified ASM-FTC strategy is presented to improve the path-following performance as the lateral tire force is saturated. Simulation results show that the modified ASM-FTC controller demonstrates superior tracking performance over the normal ASM-FTC while the autonomous vehicle follows desired paths

    Doubly Robust Self-Training

    Full text link
    Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provably balances between two extremes. When the pseudo-labels are entirely incorrect, our method reduces to a training process solely using labeled data. Conversely, when the pseudo-labels are completely accurate, our method transforms into a training process utilizing all pseudo-labeled data and labeled data, thus increasing the effective sample size. Through empirical evaluations on both the ImageNet dataset for image classification and the nuScenes autonomous driving dataset for 3D object detection, we demonstrate the superiority of the doubly robust loss over the standard self-training baseline
    corecore