304 research outputs found
An Universal Image Attractiveness Ranking Framework
We propose a new framework to rank image attractiveness using a novel
pairwise deep network trained with a large set of side-by-side multi-labeled
image pairs from a web image index. The judges only provide relative ranking
between two images without the need to directly assign an absolute score, or
rate any predefined image attribute, thus making the rating more intuitive and
accurate. We investigate a deep attractiveness rank net (DARN), a combination
of deep convolutional neural network and rank net, to directly learn an
attractiveness score mean and variance for each image and the underlying
criteria the judges use to label each pair. The extension of this model
(DARN-V2) is able to adapt to individual judge's personal preference. We also
show the attractiveness of search results are significantly improved by using
this attractiveness information in a real commercial search engine. We evaluate
our model against other state-of-the-art models on our side-by-side web test
data and another public aesthetic data set. With much less judgments (1M vs
50M), our model outperforms on side-by-side labeled data, and is comparable on
data labeled by absolute score.Comment: Accepted by 2019 Winter Conference on Application of Computer Vision
(WACV
Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information
Applying people detectors to unseen data is challenging since patterns distributions, such
as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ
from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt
frame by frame people detectors during runtime classification, without requiring any additional
manually labeled ground truth apart from the offline training of the detection model. Such adaptation
make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors
estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation
discriminates between relevant instants in a video sequence, i.e., identifies the representative frames
for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration
(i.e., detection threshold) of each detector under analysis, maximizing the mutual information to
obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not
require training the detectors for each new scenario and uses standard people detector outputs, i.e.,
bounding boxes. The experimental results demonstrate that the proposed approach outperforms
state-of-the-art detectors whose optimal threshold configurations are previously determined and
fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R
(HAVideo
SIMBA: scalable inversion in optical tomography using deep denoising priors
Two features desired in a three-dimensional (3D) optical tomographic image reconstruction algorithm are the ability to reduce imaging artifacts and to do fast processing of large data volumes. Traditional iterative inversion algorithms are impractical in this context due to their heavy computational and memory requirements. We propose and experimentally validate a novel scalable iterative mini-batch algorithm (SIMBA) for fast and high-quality optical tomographic imaging. SIMBA enables highquality imaging by combining two complementary information sources: the physics of the imaging system characterized by its forward model and the imaging prior characterized by a denoising deep neural net. SIMBA easily scales to very large 3D tomographic datasets by processing only a small subset of measurements at each iteration. We establish the theoretical fixedpoint convergence of SIMBA under nonexpansive denoisers for convex data-fidelity terms. We validate SIMBA on both simulated and experimentally collected intensity diffraction tomography (IDT) datasets. Our results show that SIMBA can significantly reduce the computational burden of 3D image formation without sacrificing the imaging quality.https://arxiv.org/abs/1911.13241First author draf
Graph Spectral Image Processing
Recent advent of graph signal processing (GSP) has spurred intensive studies
of signals that live naturally on irregular data kernels described by graphs
(e.g., social networks, wireless sensor networks). Though a digital image
contains pixels that reside on a regularly sampled 2D grid, if one can design
an appropriate underlying graph connecting pixels with weights that reflect the
image structure, then one can interpret the image (or image patch) as a signal
on a graph, and apply GSP tools for processing and analysis of the signal in
graph spectral domain. In this article, we overview recent graph spectral
techniques in GSP specifically for image / video processing. The topics covered
include image compression, image restoration, image filtering and image
segmentation
- …