2,339 research outputs found
Orthonormal Product Quantization Network for Scalable Face Image Retrieval
Recently, deep hashing with Hamming distance metric has drawn increasing
attention for face image retrieval tasks. However, its counterpart deep
quantization methods, which learn binary code representations with
dictionary-related distance metrics, have seldom been explored for the task.
This paper makes the first attempt to integrate product quantization into an
end-to-end deep learning framework for face image retrieval. Unlike prior deep
quantization methods where the codewords for quantization are learned from
data, we propose a novel scheme using predefined orthonormal vectors as
codewords, which aims to enhance the quantization informativeness and reduce
the codewords' redundancy. To make the most of the discriminative information,
we design a tailored loss function that maximizes the identity discriminability
in each quantization subspace for both the quantized and the original features.
Furthermore, an entropy-based regularization term is imposed to reduce the
quantization error. We conduct experiments on three commonly-used datasets
under the settings of both single-domain and cross-domain retrieval. It shows
that the proposed method outperforms all the compared deep hashing/quantization
methods under both settings with significant superiority. The proposed
codewords scheme consistently improves both regular model performance and model
generalization ability, verifying the importance of codewords' distribution for
the quantization quality. Besides, our model's better generalization ability
than deep hashing models indicates that it is more suitable for scalable face
image retrieval tasks
Poisson noise reduction with non-local PCA
Photon-limited imaging arises when the number of photons collected by a
sensor array is small relative to the number of detector elements. Photon
limitations are an important concern for many applications such as spectral
imaging, night vision, nuclear medicine, and astronomy. Typically a Poisson
distribution is used to model these observations, and the inherent
heteroscedasticity of the data combined with standard noise removal methods
yields significant artifacts. This paper introduces a novel denoising algorithm
for photon-limited images which combines elements of dictionary learning and
sparse patch-based representations of images. The method employs both an
adaptation of Principal Component Analysis (PCA) for Poisson noise and recently
developed sparsity-regularized convex optimization algorithms for
photon-limited images. A comprehensive empirical evaluation of the proposed
method helps characterize the performance of this approach relative to other
state-of-the-art denoising methods. The results reveal that, despite its
conceptual simplicity, Poisson PCA-based denoising appears to be highly
competitive in very low light regimes.Comment: erratum: Image man is wrongly name pepper in the journal versio
Text Extraction From Natural Scene: Methodology And Application
With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction.
Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods.
Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications
- …