180 research outputs found
Automatic Understanding of Image and Video Advertisements
There is more to images than their objective physical content: for example,
advertisements are created to persuade a viewer to take a certain action. We
propose the novel problem of automatic advertisement understanding. To enable
research on this problem, we create two datasets: an image dataset of 64,832
image ads, and a video dataset of 3,477 ads. Our data contains rich annotations
encompassing the topic and sentiment of the ads, questions and answers
describing what actions the viewer is prompted to take and the reasoning that
the ad presents to persuade the viewer ("What should I do according to this ad,
and why should I do it?"), and symbolic references ads make (e.g. a dove
symbolizes peace). We also analyze the most common persuasive strategies ads
use, and the capabilities that computer vision systems should have to
understand these strategies. We present baseline classification results for
several prediction tasks, including automatically answering questions about the
messages of the ads.Comment: To appear in CVPR 2017; data available on
http://cs.pitt.edu/~kovashka/ad
Domain Robustness in Multi-modality Learning and Visual Question Answering
Humans perceive the world via multiple modalities, as information from a single modality is usually partial and incomplete. This observation motivates the development of machine learning algorithms capable of handling multi-modal data and performing intelligent reasoning. The recent resurgence of deep learning brings both opportunities and challenges to multi-modal reasoning. On the one hand, its strong representation learning capability provides a unified approach to represent information across multiple modalities. On the other hand, properly training such models typically requires enormous data, which is not always feasible especially for the multi-modal setting.
One promising direction to mitigate the lack of data for deep learning models is to transfer knowledge (e.g., gained from solving related problems) to low-resource domains. This procedure is known as transfer learning or domain adaptation, and it has demonstrated great success in various visual and linguistic applications. However, how to effectively transfer knowledge in a multi-modality setting remains a research question. In this thesis, we choose multi-modal reasoning as our target task and aim at improving the performance of deep neural networks on low-resource domains via domain adaptation. We first briefly discuss our prior work about advertisement understanding (as a typical multi-modal reasoning problem) and share our experience from addressing the data-availability challenge. Next, we turn to visual question answering, a more general problem that involves more complicated reasoning. We evaluate mainstream VQA models and classic single-modal domain adaptation strategies and show that existing methods usually suffer significant performance degradation when directly apply to the multi-modal setting. We measure the domain gaps in different modalities and design an effective strategy to manually control domain shifts on individual modalities, which helps better understand the problem. Lastly, we present a systematic study across real datasets to answer a few fundamental questions regarding knowledge transfer in VQA, such as the sensitivity of various models towards different types of supervisions (i.e. unsupervised, self-supervised, semi-supervised, and fully supervised). We conclude by sharing the limitations and our vision for future research directions
Optimization Analysis of the Structural Design and Stability Parameters of a Rehabilitation Robot
In this paper, a lower limb rehabilitation robot, suitable for stroke patients, is designed to meet the needs of the lower limb training in a later stage of rehabilitation. The rehabilitation robot is composed of a gantry structure, a driving system, a weight support system, and a human-computer interaction system. Such a robot can assist the patients to stand and walk on the ground. Because of the weakness of the lower limbs on the affected side, stroke patients find it difficult to maintain their own body balance. The patients may fall due to a change in body posture caused by insufficient body function. Therefore, it is necessary to evaluate the stability of the rehabilitation robot after being impacted by the patient\u27s fall during use. This paper presents a method for the analysis of robot stability and develops an approximate mathematical model of the rehabilitation robot stability based on the response surface method. Optimal structural design parameters for the rehabilitation robot under impact are determined based on the response surface mathematical model. Finally, a stability experiment of the rehabilitation robot under the optimal structural parameters is performed. The experimental results demonstrate that the universal wheel maintains a close force contact with the ground, which proves the reliable stability of the robot
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples
Backdoor attacks are serious security threats to machine learning models
where an adversary can inject poisoned samples into the training set, causing a
backdoored model which predicts poisoned samples with particular triggers to
particular target classes, while behaving normally on benign samples. In this
paper, we explore the task of purifying a backdoored model using a small clean
dataset. By establishing the connection between backdoor risk and adversarial
risk, we derive a novel upper bound for backdoor risk, which mainly captures
the risk on the shared adversarial examples (SAEs) between the backdoored model
and the purified model. This upper bound further suggests a novel bi-level
optimization problem for mitigating backdoor using adversarial training
techniques. To solve it, we propose Shared Adversarial Unlearning (SAU).
Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs
such that they are either correctly classified by the purified model and/or
differently classified by the two models, such that the backdoor effect in the
backdoored model will be mitigated in the purified model. Experiments on
various benchmark datasets and network architectures show that our proposed
method achieves state-of-the-art performance for backdoor defense
VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency
The role of data in building AI systems has recently been emphasized by the
emerging concept of data-centric AI. Unfortunately, in the real-world, datasets
may contain dirty samples, such as poisoned samples from backdoor attack, noisy
labels in crowdsourcing, and even hybrids of them. The presence of such dirty
samples makes the DNNs vunerable and unreliable.Hence, it is critical to detect
dirty samples to improve the quality and realiability of dataset. Existing
detectors only focus on detecting poisoned samples or noisy labels, that are
often prone to weak generalization when dealing with dirty samples from other
domains.In this paper, we find a commonality of various dirty samples is
visual-linguistic inconsistency between images and associated labels. To
capture the semantic inconsistency between modalities, we propose versatile
data cleanser (VDC) leveraging the surpassing capabilities of multimodal large
language models (MLLM) in cross-modal alignment and reasoning.It consists of
three consecutive modules: the visual question generation module to generate
insightful questions about the image; the visual question answering module to
acquire the semantics of the visual content by answering the questions with
MLLM; followed by the visual answer evaluation module to evaluate the
inconsistency.Extensive experiments demonstrate its superior performance and
generalization to various categories and types of dirty samples.Comment: 22 pages,5 figures,17 table
- …