1,848 research outputs found
Unsupervised Action Proposal Ranking through Proposal Recombination
Recently, action proposal methods have played an important role in action
recognition tasks, as they reduce the search space dramatically. Most
unsupervised action proposal methods tend to generate hundreds of action
proposals which include many noisy, inconsistent, and unranked action
proposals, while supervised action proposal methods take advantage of
predefined object detectors (e.g., human detector) to refine and score the
action proposals, but they require thousands of manual annotations to train.
Given the action proposals in a video, the goal of the proposed work is to
generate a few better action proposals that are ranked properly. In our
approach, we first divide action proposal into sub-proposal and then use
Dynamic Programming based graph optimization scheme to select the optimal
combinations of sub-proposals from different proposals and assign each new
proposal a score. We propose a new unsupervised image-based actioness detector
that leverages web images and employs it as one of the node scores in our graph
formulation. Moreover, we capture motion information by estimating the number
of motion contours within each action proposal patch. The proposed method is an
unsupervised method that neither needs bounding box annotations nor video level
labels, which is desirable with the current explosion of large-scale action
datasets. Our approach is generic and does not depend on a specific action
proposal method. We evaluate our approach on several publicly available trimmed
and un-trimmed datasets and obtain better performance compared to several
proposal ranking methods. In addition, we demonstrate that properly ranked
proposals produce significantly better action detection as compared to
state-of-the-art proposal based methods
Re-Attention Transformer for Weakly Supervised Object Localization
Weakly supervised object localization is a challenging task which aims to
localize objects with coarse annotations such as image categories. Existing
deep network approaches are mainly based on class activation map, which focuses
on highlighting discriminative local region while ignoring the full object. In
addition, the emerging transformer-based techniques constantly put a lot of
emphasis on the backdrop that impedes the ability to identify complete objects.
To address these issues, we present a re-attention mechanism termed token
refinement transformer (TRT) that captures the object-level semantics to guide
the localization well. Specifically, TRT introduces a novel module named token
priority scoring module (TPSM) to suppress the effects of background noise
while focusing on the target object. Then, we incorporate the class activation
map as the semantically aware input to restrain the attention map to the target
object. Extensive experiments on two benchmarks showcase the superiority of our
proposed method against existing methods with image category annotations.
Source code is available in
\url{https://github.com/su-hui-zz/ReAttentionTransformer}.Comment: 11 pages, 5 figure
- …