523 research outputs found
A Dilated Inception Network for Visual Saliency Prediction
Recently, with the advent of deep convolutional neural networks (DCNN), the
improvements in visual saliency prediction research are impressive. One
possible direction to approach the next improvement is to fully characterize
the multi-scale saliency-influential factors with a computationally-friendly
module in DCNN architectures. In this work, we proposed an end-to-end dilated
inception network (DINet) for visual saliency prediction. It captures
multi-scale contextual features effectively with very limited extra parameters.
Instead of utilizing parallel standard convolutions with different kernel sizes
as the existing inception module, our proposed dilated inception module (DIM)
uses parallel dilated convolutions with different dilation rates which can
significantly reduce the computation load while enriching the diversity of
receptive fields in feature maps. Moreover, the performance of our saliency
model is further improved by using a set of linear normalization-based
probability distribution distance metrics as loss functions. As such, we can
formulate saliency prediction as a probability distribution prediction task for
global saliency inference instead of a typical pixel-wise regression problem.
Experimental results on several challenging saliency benchmark datasets
demonstrate that our DINet with proposed loss functions can achieve
state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are
available at https://github.com/ysyscool/DINe
ZeroMesh: Zero-shot Single-view 3D Mesh Reconstruction
Single-view 3D object reconstruction is a fundamental and challenging
computer vision task that aims at recovering 3D shapes from single-view RGB
images. Most existing deep learning based reconstruction methods are trained
and evaluated on the same categories, and they cannot work well when handling
objects from novel categories that are not seen during training. Focusing on
this issue, this paper tackles Zero-shot Single-view 3D Mesh Reconstruction, to
study the model generalization on unseen categories and encourage models to
reconstruct objects literally. Specifically, we propose an end-to-end two-stage
network, ZeroMesh, to break the category boundaries in reconstruction. Firstly,
we factorize the complicated image-to-mesh mapping into two simpler mappings,
i.e., image-to-point mapping and point-to-mesh mapping, while the latter is
mainly a geometric problem and less dependent on object categories. Secondly,
we devise a local feature sampling strategy in 2D and 3D feature spaces to
capture the local geometry shared across objects to enhance model
generalization. Thirdly, apart from the traditional point-to-point supervision,
we introduce a multi-view silhouette loss to supervise the surface generation
process, which provides additional regularization and further relieves the
overfitting problem. The experimental results show that our method
significantly outperforms the existing works on the ShapeNet and Pix3D under
different scenarios and various metrics, especially for novel objects
Towards Robust Curve Text Detection with Conditional Spatial Expansion
It is challenging to detect curve texts due to their irregular shapes and
varying sizes. In this paper, we first investigate the deficiency of the
existing curve detection methods and then propose a novel Conditional Spatial
Expansion (CSE) mechanism to improve the performance of curve text detection.
Instead of regarding the curve text detection as a polygon regression or a
segmentation problem, we treat it as a region expansion process. Our CSE starts
with a seed arbitrarily initialized within a text region and progressively
merges neighborhood regions based on the extracted local features by a CNN and
contextual information of merged regions. The CSE is highly parameterized and
can be seamlessly integrated into existing object detection frameworks.
Enhanced by the data-dependent CSE mechanism, our curve text detection system
provides robust instance-level text region extraction with minimal
post-processing. The analysis experiment shows that our CSE can handle texts
with various shapes, sizes, and orientations, and can effectively suppress the
false-positives coming from text-like textures or unexpected texts included in
the same RoI. Compared with the existing curve text detection algorithms, our
method is more robust and enjoys a simpler processing flow. It also creates a
new state-of-art performance on curve text benchmarks with F-score of up to
78.4.Comment: This paper has been accepted by IEEE International Conference on
Computer Vision and Pattern Recognition (CVPR 2019
When Face Recognition Meets with Deep Learning: an Evaluation of Convolutional Neural Networks for Face Recognition
Deep learning, in particular Convolutional Neural Network (CNN), has achieved
promising results in face recognition recently. However, it remains an open
question: why CNNs work well and how to design a 'good' architecture. The
existing works tend to focus on reporting CNN architectures that work well for
face recognition rather than investigate the reason. In this work, we conduct
an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a
common ground to make our work easily reproducible. Specifically, we use public
database LFW (Labeled Faces in the Wild) to train CNNs, unlike most existing
CNNs trained on private databases. We propose three CNN architectures which are
the first reported architectures trained using LFW data. This paper
quantitatively compares the architectures of CNNs and evaluate the effect of
different implementation choices. We identify several useful properties of
CNN-FRS. For instance, the dimensionality of the learned features can be
significantly reduced without adverse effect on face recognition accuracy. In
addition, traditional metric learning method exploiting CNN-learned features is
evaluated. Experiments show two crucial factors to good CNN-FRS performance are
the fusion of multiple CNNs and metric learning. To make our work reproducible,
source code and models will be made publicly available.Comment: 7 pages, 4 figures, 7 table
Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization
Recent neural networks based surface reconstruction can be roughly divided
into two categories, one warping templates explicitly and the other
representing 3D surfaces implicitly. To enjoy the advantages of both, we
propose a novel 3D representation, Neural Vector Fields (NVF), which adopts the
explicit learning process to manipulate meshes and implicit unsigned distance
function (UDF) representation to break the barriers in resolution and topology.
This is achieved by directly predicting the displacements from surface queries
and modeling shapes as Vector Fields, rather than relying on network
differentiation to obtain direction fields as most existing UDF-based methods
do. In this way, our approach is capable of encoding both the distance and the
direction fields so that the calculation of direction fields is
differentiation-free, circumventing the non-trivial surface extraction step.
Furthermore, building upon NVFs, we propose to incorporate two types of shape
codebooks, \ie, NVFs (Lite or Ultra), to promote cross-category reconstruction
through encoding cross-object priors. Moreover, we propose a new regularization
based on analyzing the zero-curl property of NVFs, and implement this through
the fully differentiable framework of our NVF (ultra). We evaluate both NVFs on
four surface reconstruction scenarios, including watertight vs non-watertight
shapes, category-agnostic reconstruction vs category-unseen reconstruction,
category-specific, and cross-domain reconstruction
Few-shot Image Generation via Style Adaptation and Content Preservation
Training a generative model with limited data (e.g., 10) is a very
challenging task. Many works propose to fine-tune a pre-trained GAN model.
However, this can easily result in overfitting. In other words, they manage to
adapt the style but fail to preserve the content, where \textit{style} denotes
the specific properties that defines a domain while \textit{content} denotes
the domain-irrelevant information that represents diversity. Recent works try
to maintain a pre-defined correspondence to preserve the content, however, the
diversity is still not enough and it may affect style adaptation. In this work,
we propose a paired image reconstruction approach for content preservation. We
propose to introduce an image translation module to GAN transferring, where the
module teaches the generator to separate style and content, and the generator
provides training data to the translation module in return. Qualitative and
quantitative experiments show that our method consistently surpasses the
state-of-the-art methods in few shot setting
Neural Vector Fields: Implicit Representation by Explicit Learning
Deep neural networks (DNNs) are widely applied for nowadays 3D surface
reconstruction tasks and such methods can be further divided into two
categories, which respectively warp templates explicitly by moving vertices or
represent 3D surfaces implicitly as signed or unsigned distance functions.
Taking advantage of both advanced explicit learning process and powerful
representation ability of implicit functions, we propose a novel 3D
representation method, Neural Vector Fields (NVF). It not only adopts the
explicit learning process to manipulate meshes directly, but also leverages the
implicit representation of unsigned distance functions (UDFs) to break the
barriers in resolution and topology. Specifically, our method first predicts
the displacements from queries towards the surface and models the shapes as
\textit{Vector Fields}. Rather than relying on network differentiation to
obtain direction fields as most existing UDF-based methods, the produced vector
fields encode the distance and direction fields both and mitigate the ambiguity
at "ridge" points, such that the calculation of direction fields is
straightforward and differentiation-free. The differentiation-free
characteristic enables us to further learn a shape codebook via Vector
Quantization, which encodes the cross-object priors, accelerates the training
procedure, and boosts model generalization on cross-category reconstruction.
The extensive experiments on surface reconstruction benchmarks indicate that
our method outperforms those state-of-the-art methods in different evaluation
scenarios including watertight vs non-watertight shapes, category-specific vs
category-agnostic reconstruction, category-unseen reconstruction, and
cross-domain reconstruction. Our code is released at
https://github.com/Wi-sc/NVF.Comment: Accepted by CVPR2023. Video:
https://www.youtube.com/watch?v=GMXKoJfmHr
- …