2,639 research outputs found
Bounded-Distortion Metric Learning
Metric learning aims to embed one metric space into another to benefit tasks
like classification and clustering. Although a greatly distorted metric space
has a high degree of freedom to fit training data, it is prone to overfitting
and numerical inaccuracy. This paper presents {\it bounded-distortion metric
learning} (BDML), a new metric learning framework which amounts to finding an
optimal Mahalanobis metric space with a bounded-distortion constraint. An
efficient solver based on the multiplicative weights update method is proposed.
Moreover, we generalize BDML to pseudo-metric learning and devise the
semidefinite relaxation and a randomized algorithm to approximately solve it.
We further provide theoretical analysis to show that distortion is a key
ingredient for stability and generalization ability of our BDML algorithm.
Extensive experiments on several benchmark datasets yield promising results
Towards Instance-level Image-to-Image Translation
Unpaired Image-to-image Translation is a new rising and challenging vision
problem that aims to learn a mapping between unaligned image pairs in diverse
domains. Recent advances in this field like MUNIT and DRIT mainly focus on
disentangling content and style/attribute from a given image first, then
directly adopting the global style to guide the model to synthesize new domain
images. However, this kind of approaches severely incurs contradiction if the
target domain images are content-rich with multiple discrepant objects. In this
paper, we present a simple yet effective instance-aware image-to-image
translation approach (INIT), which employs the fine-grained local (instance)
and global styles to the target image spatially. The proposed INIT exhibits
three import advantages: (1) the instance-level objective loss can help learn a
more accurate reconstruction and incorporate diverse attributes of objects; (2)
the styles used for target domain of local/global areas are from corresponding
spatial regions in source domain, which intuitively is a more reasonable
mapping; (3) the joint training process can benefit both fine and coarse
granularity and incorporates instance information to improve the quality of
global translation. We also collect a large-scale benchmark for the new
instance-level translation task. We observe that our synthetic images can even
benefit real-world vision tasks like generic object detection.Comment: Accepted to CVPR 2019. Project page:
http://zhiqiangshen.com/projects/INIT/index.htm
OVSNet : Towards One-Pass Real-Time Video Object Segmentation
Video object segmentation aims at accurately segmenting the target object
regions across consecutive frames. It is technically challenging for coping
with complicated factors (e.g., shape deformations, occlusion and out of the
lens). Recent approaches have largely solved them by using backforth
re-identification and bi-directional mask propagation. However, their methods
are extremely slow and only support offline inference, which in principle
cannot be applied in real time. Motivated by this observation, we propose a
efficient detection-based paradigm for video object segmentation. We propose an
unified One-Pass Video Segmentation framework (OVS-Net) for modeling
spatial-temporal representation in a unified pipeline, which seamlessly
integrates object detection, object segmentation, and object re-identification.
The proposed framework lends itself to one-pass inference that effectively and
efficiently performs video object segmentation. Moreover, we propose a
maskguided attention module for modeling the multi-scale object boundary and
multi-level feature fusion. Experiments on the challenging DAVIS 2017
demonstrate the effectiveness of the proposed framework with comparable
performance to the state-of-the-art, and the great efficiency about 11.5 FPS
towards pioneering real-time work to our knowledge, more than 5 times faster
than other state-of-the-art methods.Comment: 10 pages, 6 figure
- …
