134 research outputs found
Dolphins: Multimodal Language Model for Driving
The quest for fully autonomous vehicles (AVs) capable of navigating complex
real-world scenarios with human-like understanding and responsiveness. In this
paper, we introduce Dolphins, a novel vision-language model architected to
imbibe human-like abilities as a conversational driving assistant. Dolphins is
adept at processing multimodal inputs comprising video (or image) data, text
instructions, and historical control signals to generate informed outputs
corresponding to the provided instructions. Building upon the open-sourced
pretrained Vision-Language Model, OpenFlamingo, we first enhance Dolphins's
reasoning capabilities through an innovative Grounded Chain of Thought (GCoT)
process. Then we tailored Dolphins to the driving domain by constructing
driving-specific instruction data and conducting instruction tuning. Through
the utilization of the BDD-X dataset, we designed and consolidated four
distinct AV tasks into Dolphins to foster a holistic understanding of intricate
driving scenarios. As a result, the distinctive features of Dolphins are
characterized into two dimensions: (1) the ability to provide a comprehensive
understanding of complex and long-tailed open-world driving scenarios and solve
a spectrum of AV tasks, and (2) the emergence of human-like capabilities
including gradient-free instant adaptation via in-context learning and error
recovery via reflection.Comment: The project page is available at https://vlm-driver.github.io
Improved OOD Generalization via Conditional Invariant Regularizer
Recently, generalization on out-of-distribution (OOD) data with correlation
shift has attracted great attention. The correlation shift is caused by the
spurious attributes that correlate to the class label, as the correlation
between them may vary in training and test data. For such a problem, we show
that given the class label, the conditionally independent models of spurious
attributes are OOD generalizable. Based on this, a metric Conditional Spurious
Variation (CSV) which controls OOD generalization error, is proposed to measure
such conditional independence. To improve the OOD generalization, we regularize
the training process with the proposed CSV. Under mild assumptions, our
training objective can be formulated as a nonconvex-concave mini-max problem.
An algorithm with provable convergence rate is proposed to solve the problem.
Extensive empirical results verify our algorithm's efficacy in improving OOD
generalization
Cache-Enabled in Cooperative Cognitive Radio Networks for Transmission Performance
The proliferation of mobile devices that support the acceleration of data services (especially smartphones) has resulted in a dramatic increase in mobile traffic. Mobile data also increased exponentially, already exceeding the throughput of the backhaul. To improve spectrum utilization and increase mobile network traffic, in combination with content caching, we study the cooperation between primary and secondary networks via content caching. We consider that the secondary base station assists the primary user by pre-caching some popular primary contents. Thus, the secondary base station can obtain more licensed bandwidth to serve its own user. We mainly focus on the time delay from the backhaul link to the secondary base station. First, in terms of the content caching and the transmission strategies, we provide a cooperation scheme to maximize the secondary user’s effective data transmission rates under the constraint of the primary users target rate. Then, we investigate the impact of the caching allocation and prove that the formulated problem is a concave problem with regard to the caching capacity allocation for any given power allocation. Furthermore, we obtain the joint caching and power allocation by an effective bisection search algorithm. Finally, our results show that the content caching cooperation scheme can achieve significant performance gain for the primary and secondary systems over the traditional two-hop relay cooperation without caching
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022
In this report, we present our approach and empirical results of applying
masked autoencoders in two egocentric video understanding tasks, namely, Object
State Change Classification and PNR Temporal Localization, of Ego4D Challenge
2022. As team TheSSVL, we ranked 2nd place in both tasks. Our code will be made
available.Comment: 5 page
DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models
Dataset sanitization is a widely adopted proactive defense against
poisoning-based backdoor attacks, aimed at filtering out and removing poisoned
samples from training datasets. However, existing methods have shown limited
efficacy in countering the ever-evolving trigger functions, and often leading
to considerable degradation of benign accuracy. In this paper, we propose
DataElixir, a novel sanitization approach tailored to purify poisoned datasets.
We leverage diffusion models to eliminate trigger features and restore benign
features, thereby turning the poisoned samples into benign ones. Specifically,
with multiple iterations of the forward and reverse process, we extract
intermediary images and their predicted labels for each sample in the original
dataset. Then, we identify anomalous samples in terms of the presence of label
transition of the intermediary images, detect the target label by quantifying
distribution discrepancy, select their purified images considering pixel and
feature distance, and determine their ground-truth labels by training a benign
model. Experiments conducted on 9 popular attacks demonstrates that DataElixir
effectively mitigates various complex attacks while exerting minimal impact on
benign accuracy, surpassing the performance of baseline defense methods.Comment: Accepted by AAAI202
- …