Search CORE

636 research outputs found

360-degree Video Stitching for Dual-fisheye Lens Cameras Based On Rigid Moving Least Squares

Author: Eberhard Morgenroth (698234)
Mario A. Beltran (4264171)
Markus Holzner (1442332)
Maxence Carrel (4264174)
Nicolas Derlon (698232)
Rolf Kaufmann (4264168)
Verónica L. Morales (4264177)
Publication venue
Publication date: 01/01/2017
Field of study

Dual-fisheye lens cameras are becoming popular for 360-degree video capture, especially for User-generated content (UGC), since they are affordable and portable. Images generated by the dual-fisheye cameras have limited overlap and hence require non-conventional stitching techniques to produce high-quality 360x180-degree panoramas. This paper introduces a novel method to align these images using interpolation grids based on rigid moving least squares. Furthermore, jitter is the critical issue arising when one applies the image-based stitching algorithms to video. It stems from the unconstrained movement of stitching boundary from one frame to another. Therefore, we also propose a new algorithm to maintain the temporal coherence of stitching boundary to provide jitter-free 360-degree videos. Results show that the method proposed in this paper can produce higher quality stitched images and videos than prior work.Comment: Preprint versio

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

eScholarship - University of California

FigShare

Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence

Author: Chen Siyu
Chi Jingying
Ge Shiming
Jin Xin
Li Xiaodong
Peng Siwei
Song Chenggen
Wu Le
Zhao Geng
Publication venue
Publication date: 20/11/2017
Field of study

Aesthetic quality prediction is a challenging task in the computer vision community because of the complex interplay with semantic contents and photographic technologies. Recent studies on the powerful deep learning based aesthetic quality assessment usually use a binary high-low label or a numerical score to represent the aesthetic quality. However the scalar representation cannot describe well the underlying varieties of the human perception of aesthetics. In this work, we propose to predict the aesthetic score distribution (i.e., a score distribution vector of the ordinal basic human ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs which aim to minimize the difference between the predicted scalar numbers or vectors and the ground truth cannot be directly used for the ordinal basic rating distribution. Thus, a novel CNN based on the Cumulative distribution with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic score distribution of human ratings, with a new reliability-sensitive learning method based on the kurtosis of the score distribution, which eliminates the requirement of the original full data of human ratings (without normalization). Experimental results on large scale aesthetic dataset demonstrate the effectiveness of our introduced CJS-CNN in this task.Comment: AAAI Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, USA. 2-7 Feb. 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information

Author: García-Martín Álvaro
Martínez José M.
Sanmiguel Juan Carlos
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Applying people detectors to unseen data is challenging since patterns distributions, such as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt frame by frame people detectors during runtime classification, without requiring any additional manually labeled ground truth apart from the offline training of the detection model. Such adaptation make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation discriminates between relevant instants in a video sequence, i.e., identifies the representative frames for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration (i.e., detection threshold) of each detector under analysis, maximizing the mutual information to obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not require training the detectors for each new scenario and uses standard people detector outputs, i.e., bounding boxes. The experimental results demonstrate that the proposed approach outperforms state-of-the-art detectors whose optimal threshold configurations are previously determined and fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R (HAVideo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling

Author: Gao Wen
Jia Chuanmin
Ma Siwei
Wang Shanshe
Wang Zhao
Zhang Qi
Zhang Xinfeng
Publication venue
Publication date: 10/09/2023
Field of study

Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine perceptual characteristics are not effectively leveraged, leading to suboptimal compression efficiency. In this paper, we introduce Satisfied Machine Ratio (SMR) to address these issues. SMR statistically measures the quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is calculated based on the difference in machine perceptions between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and construct a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep features differences and SMR. Furthermore, we introduce an auxiliary task to increase the prediction accuracy by predicting the SMR difference between two images in different quality levels. Extensive experiments demonstrate that using the SMR models significantly improves compression performance for VCM, and the SMR models generalize well to unseen machines, traditional and neural codecs, and datasets. In summary, SMR enables perceptual coding for machines and advances VCM from specificity to generality. Code is available at \url{https://github.com/ywwynm/SMR}

arXiv.org e-Print Archive

Scraping social media photos posted in Kenya and elsewhere to detect and analyze food types

Author: Betke Margrit
Jalal Mona
Jefferson Sankara
Nsoesie Elaine O.
Wang Kaihong
Zheng Yi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ∼30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.Accepted manuscrip

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)