Search CORE

88 research outputs found

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

Author: Cui Peng
Liu Jiashuo
Namkoong Hongseok
Wang Tianyu
Publication venue
Publication date: 11/07/2023
Field of study

Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the

Y|X

distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that

Y|X

-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since

Y|X

-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest

Y|X

-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.Comment: 41 page

arXiv.org e-Print Archive

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

Author: Cheng Ying
Feng Rui
Liu Jinyu
Yu Jiashuo
Zhang Yuejie
Publication venue
Publication date: 12/07/2022
Field of study

Weakly-supervised audio-visual violence detection aims to distinguish snippets containing multimodal violence events with video-level labels. Many prior works perform audio-visual integration and interaction in an early or intermediate manner, yet overlooking the modality heterogeneousness over the weakly-supervised setting. In this paper, we analyze the modality asynchrony and undifferentiated instances phenomena of the multiple instance learning (MIL) procedure, and further investigate its negative impact on weakly-supervised audio-visual learning. To address these issues, we propose a modality-aware contrastive instance learning with self-distillation (MACIL-SD) strategy. Specifically, we leverage a lightweight two-stream network to generate audio and visual bags, in which unimodal background, violent, and normal instances are clustered into semi-bags in an unsupervised way. Then audio and visual violent semi-bag representations are assembled as positive pairs, and violent semi-bags are combined with background and normal instances in the opposite modality as contrastive negative pairs. Furthermore, a self-distillation module is applied to transfer unimodal visual knowledge to the audio-visual model, which alleviates noises and closes the semantic gap between unimodal and multimodal features. Experiments show that our framework outperforms previous methods with lower complexity on the large-scale XD-Violence dataset. Results also demonstrate that our proposed approach can be used as plug-in modules to enhance other networks. Codes are available at https://github.com/JustinYuu/MACIL_SD.Comment: ACM MM 202

arXiv.org e-Print Archive

Rethinking the Evaluation Protocol of Domain Generalization

Author: Cui Peng
He Yue
Liu Jiashuo
Xu Renzhe
Yu Han
Zhang Xingxuan
Publication venue
Publication date: 24/05/2023
Field of study

Domain generalization aims to solve the challenge of Out-of-Distribution (OOD) generalization by leveraging common knowledge learned from multiple training domains to generalize to unseen test domains. To accurately evaluate the OOD generalization ability, it is necessary to ensure that test data information is unavailable. However, the current domain generalization protocol may still have potential test data information leakage. This paper examines the potential risks of test data information leakage in two aspects of the current protocol: pretraining on ImageNet and oracle model selection. We propose that training from scratch and using multiple test domains would result in a more precise evaluation of OOD generalization ability. We also rerun the algorithms with the modified protocol and introduce a new leaderboard to encourage future research in domain generalization with a fairer comparison

arXiv.org e-Print Archive

Distributionally Robust Learning with Stable Adversarial Training

Author: Cui Peng
Kuang Kun
Li Bo
Liu Jiashuo
Shen Zheyan
Zhou Linjun
Publication venue
Publication date: 29/06/2021
Field of study

Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. There is an emerging literature on tackling this problem by minimizing the worst-case risk over an uncertainty set. However, existing methods mostly construct ambiguity sets by treating all variables equally regardless of the stability of their correlations with the target, resulting in the overwhelmingly-large uncertainty set and low confidence of the learner. In this paper, we propose a novel Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set and conduct differentiated robustness optimization, where covariates are differentiated according to the stability of their correlations with the target. We theoretically show that our method is tractable for stochastic gradient-based optimization and provide the performance guarantees for our method. Empirical studies on both simulation and real datasets validate the effectiveness of our method in terms of uniformly good performance across unknown distributional shifts.Comment: arXiv admin note: substantial text overlap with arXiv:2006.0441

arXiv.org e-Print Archive

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning

Author: Cheng Yi
Li Wenjie
Liang Xiaodan
Liu Bang
Liu Wenge
Wang Jiashuo
Zhao Ruihui
Zheng Yefeng
Publication venue
Publication date: 09/10/2022
Field of study

Providing Emotional Support (ES) to soothe people in emotional distress is an essential capability in social interactions. Most existing researches on building ES conversation systems only considered single-turn interactions with users, which was over-simplified. In comparison, multi-turn ES conversation systems can provide ES more effectively, but face several new technical challenges, including: (1) how to adopt appropriate support strategies to achieve the long-term dialogue goal of comforting the user's emotion; (2) how to dynamically model the user's state. In this paper, we propose a novel system MultiESC to address these issues. For strategy planning, drawing inspiration from the A* search algorithm, we propose lookahead heuristics to estimate the future user feedback after using particular strategies, which helps to select strategies that can lead to the best long-term effects. For user state modeling, MultiESC focuses on capturing users' subtle emotional expressions and understanding their emotion causes. Extensive experiments show that MultiESC significantly outperforms competitive baselines in both dialogue generation and strategy planning. Our codes are available at https://github.com/lwgkzl/MultiESC.Comment: Accepted by the main conference of EMNLP 202

arXiv.org e-Print Archive

What can we learn from the 2008 financial crisis for global power decarbonization after COVID-19?

Author: Feng Kuishuang
Ge Liming
Li Jiashuo
Li Shuping
Liu Xi
Peng Xu
Shan Yuli
Sun Laixiang
Wei Wendong
Zhang Pengfei
Zhao Xu
Zuo Jian
Publication venue: 'Elsevier BV'
Publication date: 14/03/2023
Field of study

University of Birmingham Research Portal

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Author: Chen Xinyuan
He Yinan
Li Kunchang
Li Yizhuo
Liu Ziwei
Luo Ping
Ma Xin
Qiao Yu
Wang Limin
Wang Yali
Wang Yaohui
Wang Yi
Yu Jiashuo
Publication venue
Publication date: 13/07/2023
Field of study

This paper introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodal understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words. Our core contribution is to develop a scalable approach to autonomously build a high-quality video-text dataset with large language models (LLM), thereby showcasing its efficacy in learning video-language representation at scale. Specifically, we utilize a multi-scale approach to generate video-related descriptions. Furthermore, we introduce ViCLIP, a video-text representation learning model based on ViT-L. Learned on InternVid via contrastive learning, this model demonstrates leading zero-shot action recognition and competitive video retrieval performance. Beyond basic video understanding tasks like recognition and retrieval, our dataset and model have broad applications. They are particularly beneficial for generating interleaved video-text data for learning a video-centric dialogue system, advancing video-to-text and text-to-video generation research. These proposed resources provide a tool for researchers and practitioners interested in multimodal video understanding and generation.Comment: Data and Code: https://github.com/OpenGVLab/InternVideo/tree/main/Data/InternVi

arXiv.org e-Print Archive