636 research outputs found
360-degree Video Stitching for Dual-fisheye Lens Cameras Based On Rigid Moving Least Squares
Dual-fisheye lens cameras are becoming popular for 360-degree video capture,
especially for User-generated content (UGC), since they are affordable and
portable. Images generated by the dual-fisheye cameras have limited overlap and
hence require non-conventional stitching techniques to produce high-quality
360x180-degree panoramas. This paper introduces a novel method to align these
images using interpolation grids based on rigid moving least squares.
Furthermore, jitter is the critical issue arising when one applies the
image-based stitching algorithms to video. It stems from the unconstrained
movement of stitching boundary from one frame to another. Therefore, we also
propose a new algorithm to maintain the temporal coherence of stitching
boundary to provide jitter-free 360-degree videos. Results show that the method
proposed in this paper can produce higher quality stitched images and videos
than prior work.Comment: Preprint versio
Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence
Aesthetic quality prediction is a challenging task in the computer vision
community because of the complex interplay with semantic contents and
photographic technologies. Recent studies on the powerful deep learning based
aesthetic quality assessment usually use a binary high-low label or a numerical
score to represent the aesthetic quality. However the scalar representation
cannot describe well the underlying varieties of the human perception of
aesthetics. In this work, we propose to predict the aesthetic score
distribution (i.e., a score distribution vector of the ordinal basic human
ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs
which aim to minimize the difference between the predicted scalar numbers or
vectors and the ground truth cannot be directly used for the ordinal basic
rating distribution. Thus, a novel CNN based on the Cumulative distribution
with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic
score distribution of human ratings, with a new reliability-sensitive learning
method based on the kurtosis of the score distribution, which eliminates the
requirement of the original full data of human ratings (without normalization).
Experimental results on large scale aesthetic dataset demonstrate the
effectiveness of our introduced CJS-CNN in this task.Comment: AAAI Conference on Artificial Intelligence (AAAI), New Orleans,
Louisiana, USA. 2-7 Feb. 201
Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information
Applying people detectors to unseen data is challenging since patterns distributions, such
as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ
from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt
frame by frame people detectors during runtime classification, without requiring any additional
manually labeled ground truth apart from the offline training of the detection model. Such adaptation
make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors
estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation
discriminates between relevant instants in a video sequence, i.e., identifies the representative frames
for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration
(i.e., detection threshold) of each detector under analysis, maximizing the mutual information to
obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not
require training the detectors for each new scenario and uses standard people detector outputs, i.e.,
bounding boxes. The experimental results demonstrate that the proposed approach outperforms
state-of-the-art detectors whose optimal threshold configurations are previously determined and
fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R
(HAVideo
Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling
Video Coding for Machines (VCM) aims to compress visual signals for machine
analysis. However, existing methods only consider a few machines, neglecting
the majority. Moreover, the machine perceptual characteristics are not
effectively leveraged, leading to suboptimal compression efficiency. In this
paper, we introduce Satisfied Machine Ratio (SMR) to address these issues. SMR
statistically measures the quality of compressed images and videos for machines
by aggregating satisfaction scores from them. Each score is calculated based on
the difference in machine perceptions between original and compressed images.
Targeting image classification and object detection tasks, we build two
representative machine libraries for SMR annotation and construct a large-scale
SMR dataset to facilitate SMR studies. We then propose an SMR prediction model
based on the correlation between deep features differences and SMR.
Furthermore, we introduce an auxiliary task to increase the prediction accuracy
by predicting the SMR difference between two images in different quality
levels. Extensive experiments demonstrate that using the SMR models
significantly improves compression performance for VCM, and the SMR models
generalize well to unseen machines, traditional and neural codecs, and
datasets. In summary, SMR enables perceptual coding for machines and advances
VCM from specificity to generality. Code is available at
\url{https://github.com/ywwynm/SMR}
Scraping social media photos posted in Kenya and elsewhere to detect and analyze food types
Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ∼30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in
Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.Accepted manuscrip
- …