538 research outputs found

    An Universal Image Attractiveness Ranking Framework

    Full text link
    We propose a new framework to rank image attractiveness using a novel pairwise deep network trained with a large set of side-by-side multi-labeled image pairs from a web image index. The judges only provide relative ranking between two images without the need to directly assign an absolute score, or rate any predefined image attribute, thus making the rating more intuitive and accurate. We investigate a deep attractiveness rank net (DARN), a combination of deep convolutional neural network and rank net, to directly learn an attractiveness score mean and variance for each image and the underlying criteria the judges use to label each pair. The extension of this model (DARN-V2) is able to adapt to individual judge's personal preference. We also show the attractiveness of search results are significantly improved by using this attractiveness information in a real commercial search engine. We evaluate our model against other state-of-the-art models on our side-by-side web test data and another public aesthetic data set. With much less judgments (1M vs 50M), our model outperforms on side-by-side labeled data, and is comparable on data labeled by absolute score.Comment: Accepted by 2019 Winter Conference on Application of Computer Vision (WACV

    Coarse-to-Fine Annotation Enrichment for Semantic Segmentation Learning

    Full text link
    Rich high-quality annotated data is critical for semantic segmentation learning, yet acquiring dense and pixel-wise ground-truth is both labor- and time-consuming. Coarse annotations (e.g., scribbles, coarse polygons) offer an economical alternative, with which training phase could hardly generate satisfactory performance unfortunately. In order to generate high-quality annotated data with a low time cost for accurate segmentation, in this paper, we propose a novel annotation enrichment strategy, which expands existing coarse annotations of training data to a finer scale. Extensive experiments on the Cityscapes and PASCAL VOC 2012 benchmarks have shown that the neural networks trained with the enriched annotations from our framework yield a significant improvement over that trained with the original coarse labels. It is highly competitive to the performance obtained by using human annotated dense annotations. The proposed method also outperforms among other state-of-the-art weakly-supervised segmentation methods.Comment: CIKM 2018 International Conference on Information and Knowledge Managemen

    Simultaneous Facial Landmark Detection, Pose and Deformation Estimation under Facial Occlusion

    Full text link
    Facial landmark detection, head pose estimation, and facial deformation analysis are typical facial behavior analysis tasks in computer vision. The existing methods usually perform each task independently and sequentially, ignoring their interactions. To tackle this problem, we propose a unified framework for simultaneous facial landmark detection, head pose estimation, and facial deformation analysis, and the proposed model is robust to facial occlusion. Following a cascade procedure augmented with model-based head pose estimation, we iteratively update the facial landmark locations, facial occlusion, head pose and facial de- formation until convergence. The experimental results on benchmark databases demonstrate the effectiveness of the proposed method for simultaneous facial landmark detection, head pose and facial deformation estimation, even if the images are under facial occlusion.Comment: International Conference on Computer Vision and Pattern Recognition, 201

    Scraping social media photos posted in Kenya and elsewhere to detect and analyze food types

    Full text link
    Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ∼30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.Accepted manuscrip
    • …
    corecore