134 research outputs found

    Dolphins: Multimodal Language Model for Driving

    Full text link
    The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world scenarios with human-like understanding and responsiveness. In this paper, we introduce Dolphins, a novel vision-language model architected to imbibe human-like abilities as a conversational driving assistant. Dolphins is adept at processing multimodal inputs comprising video (or image) data, text instructions, and historical control signals to generate informed outputs corresponding to the provided instructions. Building upon the open-sourced pretrained Vision-Language Model, OpenFlamingo, we first enhance Dolphins's reasoning capabilities through an innovative Grounded Chain of Thought (GCoT) process. Then we tailored Dolphins to the driving domain by constructing driving-specific instruction data and conducting instruction tuning. Through the utilization of the BDD-X dataset, we designed and consolidated four distinct AV tasks into Dolphins to foster a holistic understanding of intricate driving scenarios. As a result, the distinctive features of Dolphins are characterized into two dimensions: (1) the ability to provide a comprehensive understanding of complex and long-tailed open-world driving scenarios and solve a spectrum of AV tasks, and (2) the emergence of human-like capabilities including gradient-free instant adaptation via in-context learning and error recovery via reflection.Comment: The project page is available at https://vlm-driver.github.io

    Improved OOD Generalization via Conditional Invariant Regularizer

    Full text link
    Recently, generalization on out-of-distribution (OOD) data with correlation shift has attracted great attention. The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data. For such a problem, we show that given the class label, the conditionally independent models of spurious attributes are OOD generalizable. Based on this, a metric Conditional Spurious Variation (CSV) which controls OOD generalization error, is proposed to measure such conditional independence. To improve the OOD generalization, we regularize the training process with the proposed CSV. Under mild assumptions, our training objective can be formulated as a nonconvex-concave mini-max problem. An algorithm with provable convergence rate is proposed to solve the problem. Extensive empirical results verify our algorithm's efficacy in improving OOD generalization

    Cache-Enabled in Cooperative Cognitive Radio Networks for Transmission Performance

    Get PDF
    The proliferation of mobile devices that support the acceleration of data services (especially smartphones) has resulted in a dramatic increase in mobile traffic. Mobile data also increased exponentially, already exceeding the throughput of the backhaul. To improve spectrum utilization and increase mobile network traffic, in combination with content caching, we study the cooperation between primary and secondary networks via content caching. We consider that the secondary base station assists the primary user by pre-caching some popular primary contents. Thus, the secondary base station can obtain more licensed bandwidth to serve its own user. We mainly focus on the time delay from the backhaul link to the secondary base station. First, in terms of the content caching and the transmission strategies, we provide a cooperation scheme to maximize the secondary user’s effective data transmission rates under the constraint of the primary users target rate. Then, we investigate the impact of the caching allocation and prove that the formulated problem is a concave problem with regard to the caching capacity allocation for any given power allocation. Furthermore, we obtain the joint caching and power allocation by an effective bisection search algorithm. Finally, our results show that the content caching cooperation scheme can achieve significant performance gain for the primary and secondary systems over the traditional two-hop relay cooperation without caching

    Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022

    Full text link
    In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022. As team TheSSVL, we ranked 2nd place in both tasks. Our code will be made available.Comment: 5 page

    DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models

    Full text link
    Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets. However, existing methods have shown limited efficacy in countering the ever-evolving trigger functions, and often leading to considerable degradation of benign accuracy. In this paper, we propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets. We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones. Specifically, with multiple iterations of the forward and reverse process, we extract intermediary images and their predicted labels for each sample in the original dataset. Then, we identify anomalous samples in terms of the presence of label transition of the intermediary images, detect the target label by quantifying distribution discrepancy, select their purified images considering pixel and feature distance, and determine their ground-truth labels by training a benign model. Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy, surpassing the performance of baseline defense methods.Comment: Accepted by AAAI202
    • …
    corecore