636 research outputs found

    360-degree Video Stitching for Dual-fisheye Lens Cameras Based On Rigid Moving Least Squares

    Full text link
    Dual-fisheye lens cameras are becoming popular for 360-degree video capture, especially for User-generated content (UGC), since they are affordable and portable. Images generated by the dual-fisheye cameras have limited overlap and hence require non-conventional stitching techniques to produce high-quality 360x180-degree panoramas. This paper introduces a novel method to align these images using interpolation grids based on rigid moving least squares. Furthermore, jitter is the critical issue arising when one applies the image-based stitching algorithms to video. It stems from the unconstrained movement of stitching boundary from one frame to another. Therefore, we also propose a new algorithm to maintain the temporal coherence of stitching boundary to provide jitter-free 360-degree videos. Results show that the method proposed in this paper can produce higher quality stitched images and videos than prior work.Comment: Preprint versio

    Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence

    Full text link
    Aesthetic quality prediction is a challenging task in the computer vision community because of the complex interplay with semantic contents and photographic technologies. Recent studies on the powerful deep learning based aesthetic quality assessment usually use a binary high-low label or a numerical score to represent the aesthetic quality. However the scalar representation cannot describe well the underlying varieties of the human perception of aesthetics. In this work, we propose to predict the aesthetic score distribution (i.e., a score distribution vector of the ordinal basic human ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs which aim to minimize the difference between the predicted scalar numbers or vectors and the ground truth cannot be directly used for the ordinal basic rating distribution. Thus, a novel CNN based on the Cumulative distribution with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic score distribution of human ratings, with a new reliability-sensitive learning method based on the kurtosis of the score distribution, which eliminates the requirement of the original full data of human ratings (without normalization). Experimental results on large scale aesthetic dataset demonstrate the effectiveness of our introduced CJS-CNN in this task.Comment: AAAI Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, USA. 2-7 Feb. 201

    Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information

    Full text link
    Applying people detectors to unseen data is challenging since patterns distributions, such as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt frame by frame people detectors during runtime classification, without requiring any additional manually labeled ground truth apart from the offline training of the detection model. Such adaptation make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation discriminates between relevant instants in a video sequence, i.e., identifies the representative frames for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration (i.e., detection threshold) of each detector under analysis, maximizing the mutual information to obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not require training the detectors for each new scenario and uses standard people detector outputs, i.e., bounding boxes. The experimental results demonstrate that the proposed approach outperforms state-of-the-art detectors whose optimal threshold configurations are previously determined and fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R (HAVideo

    Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling

    Full text link
    Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine perceptual characteristics are not effectively leveraged, leading to suboptimal compression efficiency. In this paper, we introduce Satisfied Machine Ratio (SMR) to address these issues. SMR statistically measures the quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is calculated based on the difference in machine perceptions between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and construct a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep features differences and SMR. Furthermore, we introduce an auxiliary task to increase the prediction accuracy by predicting the SMR difference between two images in different quality levels. Extensive experiments demonstrate that using the SMR models significantly improves compression performance for VCM, and the SMR models generalize well to unseen machines, traditional and neural codecs, and datasets. In summary, SMR enables perceptual coding for machines and advances VCM from specificity to generality. Code is available at \url{https://github.com/ywwynm/SMR}

    Scraping social media photos posted in Kenya and elsewhere to detect and analyze food types

    Full text link
    Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ∼30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.Accepted manuscrip
    corecore