194 research outputs found
Deep Self-Taught Learning for Weakly Supervised Object Localization
Most existing weakly supervised localization (WSL) approaches learn detectors
by finding positive bounding boxes based on features learned with image-level
supervision. However, those features do not contain spatial location related
information and usually provide poor-quality positive samples for training a
detector. To overcome this issue, we propose a deep self-taught learning
approach, which makes the detector learn the object-level features reliable for
acquiring tight positive samples and afterwards re-train itself based on them.
Consequently, the detector progressively improves its detection ability and
localizes more informative positive samples. To implement such self-taught
learning, we propose a seed sample acquisition method via image-to-object
transferring and dense subgraph discovery to find reliable positive samples for
initializing the detector. An online supportive sample harvesting scheme is
further proposed to dynamically select the most confident tight positive
samples and train the detector in a mutual boosting way. To prevent the
detector from being trapped in poor optima due to overfitting, we propose a new
relative improvement of predicted CNN scores for guiding the self-taught
learning process. Extensive experiments on PASCAL 2007 and 2012 show that our
approach outperforms the state-of-the-arts, strongly validating its
effectiveness.Comment: Accepted as spotlight paper by CVPR 201
STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification
In this work, we propose a novel Spatial-Temporal Attention (STA) approach to
tackle the large-scale person re-identification task in videos. Different from
the most existing methods, which simply compute representations of video clips
using frame-level aggregation (e.g. average pooling), the proposed STA adopts a
more effective way for producing robust clip-level feature representation.
Concretely, our STA fully exploits those discriminative parts of one target
person in both spatial and temporal dimensions, which results in a 2-D
attention score matrix via inter-frame regularization to measure the
importances of spatial parts across different frames. Thus, a more robust
clip-level feature representation can be generated according to a weighted sum
operation guided by the mined 2-D attention score matrix. In this way, the
challenging cases for video-based person re-identification such as pose
variation and partial occlusion can be well tackled by the STA. We conduct
extensive experiments on two large-scale benchmarks, i.e. MARS and
DukeMTMC-VideoReID. In particular, the mAP reaches 87.7% on MARS, which
significantly outperforms the state-of-the-arts with a large margin of more
than 11.6%.Comment: Accepted as a conference paper at AAAI 201
Proximal Iteratively Reweighted Algorithm with Multiple Splitting for Nonconvex Sparsity Optimization
This paper proposes the Proximal Iteratively REweighted (PIRE) algorithm for
solving a general problem, which involves a large body of nonconvex sparse and
structured sparse related problems. Comparing with previous iterative solvers
for nonconvex sparse problem, PIRE is much more general and efficient. The
computational cost of PIRE in each iteration is usually as low as the
state-of-the-art convex solvers. We further propose the PIRE algorithm with
Parallel Splitting (PIRE-PS) and PIRE algorithm with Alternative Updating
(PIRE-AU) to handle the multi-variable problems. In theory, we prove that our
proposed methods converge and any limit solution is a stationary point.
Extensive experiments on both synthesis and real data sets demonstrate that our
methods achieve comparative learning performance, but are much more efficient,
by comparing with previous nonconvex solvers
Transferable Semi-supervised Semantic Segmentation
The performance of deep learning based semantic segmentation models heavily
depends on sufficient data with careful annotations. However, even the largest
public datasets only provide samples with pixel-level annotations for rather
limited semantic categories. Such data scarcity critically limits scalability
and applicability of semantic segmentation models in real applications. In this
paper, we propose a novel transferable semi-supervised semantic segmentation
model that can transfer the learned segmentation knowledge from a few strong
categories with pixel-level annotations to unseen weak categories with only
image-level annotations, significantly broadening the applicable territory of
deep segmentation models. In particular, the proposed model consists of two
complementary and learnable components: a Label transfer Network (L-Net) and a
Prediction transfer Network (P-Net). The L-Net learns to transfer the
segmentation knowledge from strong categories to the images in the weak
categories and produces coarse pixel-level semantic maps, by effectively
exploiting the similar appearance shared across categories. Meanwhile, the
P-Net tailors the transferred knowledge through a carefully designed
adversarial learning strategy and produces refined segmentation results with
better details. Integrating the L-Net and P-Net achieves 96.5% and 89.4%
performance of the fully-supervised baseline using 50% and 0% categories with
pixel-level annotations respectively on PASCAL VOC 2012. With such a novel
transfer mechanism, our proposed model is easily generalizable to a variety of
new categories, only requiring image-level annotations, and offers appealing
scalability in real applications.Comment: Minor update of arXiv:1711.0682
- …