552 research outputs found
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
We propose a weakly-supervised approach that takes image-sentence pairs as
input and learns to visually ground (i.e., localize) arbitrary linguistic
phrases, in the form of spatial attention masks. Specifically, the model is
trained with images and their associated image-level captions, without any
explicit region-to-phrase correspondence annotations. To this end, we introduce
an end-to-end model which learns visual groundings of phrases with two types of
carefully designed loss functions. In addition to the standard discriminative
loss, which enforces that attended image regions and phrases are consistently
encoded, we propose a novel structural loss which makes use of the parse tree
structures induced by the sentences. In particular, we ensure complementarity
among the attention masks that correspond to sibling noun phrases, and
compositionality of attention masks among the children and parent phrases, as
defined by the sentence parse tree. We validate the effectiveness of our
approach on the Microsoft COCO and Visual Genome datasets.Comment: CVPR 201
Visual Feature Attribution using Wasserstein GANs
Attributing the pixels of an input image to a certain category is an
important and well-studied problem in computer vision, with applications
ranging from weakly supervised localisation to understanding hidden effects in
the data. In recent years, approaches based on interpreting a previously
trained neural network classifier have become the de facto state-of-the-art and
are commonly used on medical as well as natural image datasets. In this paper,
we discuss a limitation of these approaches which may lead to only a subset of
the category specific features being detected. To address this problem we
develop a novel feature attribution technique based on Wasserstein Generative
Adversarial Networks (WGAN), which does not suffer from this limitation. We
show that our proposed method performs substantially better than the
state-of-the-art for visual attribution on a synthetic dataset and on real 3D
neuroimaging data from patients with mild cognitive impairment (MCI) and
Alzheimer's disease (AD). For AD patients the method produces compellingly
realistic disease effect maps which are very close to the observed effects.Comment: Accepted to CVPR 201
Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning
As advanced image manipulation techniques emerge, detecting the manipulation
becomes increasingly important. Despite the success of recent learning-based
approaches for image manipulation detection, they typically require expensive
pixel-level annotations to train, while exhibiting degraded performance when
testing on images that are differently manipulated compared with training
images. To address these limitations, we propose weakly-supervised image
manipulation detection, such that only binary image-level labels (authentic or
tampered with) are required for training purpose. Such a weakly-supervised
setting can leverage more training images and has the potential to adapt
quickly to new manipulation techniques. To improve the generalization ability,
we propose weakly-supervised self-consistency learning (WSCL) to leverage the
weakly annotated images. Specifically, two consistency properties are learned:
multi-source consistency (MSC) and inter-patch consistency (IPC). MSC exploits
different content-agnostic information and enables cross-source learning via an
online pseudo label generation and refinement process. IPC performs global
pair-wise patch-patch relationship reasoning to discover a complete region of
manipulation. Extensive experiments validate that our WSCL, even though is
weakly supervised, exhibits competitive performance compared with
fully-supervised counterpart under both in-distribution and out-of-distribution
evaluations, as well as reasonable manipulation localization ability.Comment: Accepted to ICCV 2023, code: https://github.com/yhZhai/WSC
A text segmentation approach for automated annotation of online customer reviews, based on topic modeling
Online customer review classification and analysis have been recognized as an important problem in many domains, such as business intelligence, marketing, and e-governance. To solve this problem, a variety of machine learning methods was developed in the past decade. Existing methods, however, either rely on human labeling or have high computing cost, or both. This makes them a poor fit to deal with dynamic and ever-growing collections of short but semantically noisy texts of customer reviews. In the present study, the problem of multi-topic online review clustering is addressed by generating high quality bronze-standard labeled sets for training efficient classifier models. A novel unsupervised algorithm is developed to break reviews into sequential semantically homogeneous segments. Segment data is then used to fine-tune a Latent Dirichlet Allocation (LDA) model obtained for the reviews, and to classify them along categories detected through topic modeling. After testing the segmentation algorithm on a benchmark text collection, it was successfully applied in a case study of tourism review classification. In all experiments conducted, the proposed approach produced results similar to or better than baseline methods. The paper critically discusses the main findings and paves ways for future work
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
- …