165 research outputs found

    Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.

    Get PDF
    The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels because human annotators are much better at ranking two images/videos (e.g. which one is more interesting) than giving an absolute value to each of them separately. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. Differing from existing methods, the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. Extensive experiments on various benchmark datasets demonstrate that our new approach significantly outperforms state-of-the-arts alternatives.Comment: 14 pages, accepted by IEEE TPAM

    Interestingness Prediction by Robust Learning to Rank

    Get PDF
    Abstract. The problem of predicting image or video interestingness from their low-level feature representations has received increasing inter-est. As a highly subjective visual attribute, annotating the interesting-ness value of training data for learning a prediction model is challenging. To make the annotation less subjective and more reliable, recent studies employ crowdsourcing tools to collect pairwise comparisons – relying on majority voting to prune the annotation outliers/errors. In this paper, we propose a more principled way to identify annotation outliers by for-mulating the interestingness prediction task as a unified robust learning to rank problem, tackling both the outlier detection and interestingness prediction tasks jointly. Extensive experiments on both image and video interestingness benchmark datasets demonstrate that our new approach significantly outperforms state-of-the-art alternatives.

    Image Aesthetic Assessment: A Comparative Study of Hand-Crafted & Deep Learning Models

    Get PDF
    publishedVersio

    Investigating human-perceptual properties of "shapes" using 3D shapes and 2D fonts

    Get PDF
    Shapes are generally used to convey meaning. They are used in video games, films and other multimedia, in diverse ways. 3D shapes may be destined for virtual scenes or represent objects to be constructed in the real-world. Fonts add character to an otherwise plain block of text, allowing the writer to make important points more visually prominent or distinct from other text. They can indicate the structure of a document, at a glance. Rather than studying shapes through traditional geometric shape descriptors, we provide alternative methods to describe and analyse shapes, from a lens of human perception. This is done via the concepts of Schelling Points and Image Specificity. Schelling Points are choices people make when they aim to match with what they expect others to choose but cannot communicate with others to determine an answer. We study whole mesh selections in this setting, where Schelling Meshes are the most frequently selected shapes. The key idea behind image Specificity is that different images evoke different descriptions; but ‘Specific’ images yield more consistent descriptions than others. We apply Specificity to 2D fonts. We show that each concept can be learned and predict them for fonts and 3D shapes, respectively, using a depth image-based convolutional neural network. Results are shown for a range of fonts and 3D shapes and we demonstrate that font Specificity and the Schelling meshes concept are useful for visualisation, clustering, and search applications. Overall, we find that each concept represents similarities between their respective type of shape, even when there are discontinuities between the shape geometries themselves. The ‘context’ of these similarities is in some kind of abstract or subjective meaning which is consistent among different people

    Predicting and Modifying Memorability of Images

    Get PDF
    Everyday, we are bombarded with many photographs of faces, whether on social media, television, or smartphones. From an evolutionary perspective, faces are intended to be remembered, mainly due to survival and personal relevance. However, all these faces do not have the equal opportunity to stick in our minds. It has been shown that memorability is an intrinsic feature of an image but yet, it is largely unknown what attributes make an image more memorable. In this work, we first proposed new models for predicting memorability of face and object images. Subsequently, we proposed a fast approach to modify and control the memorability of face images. In our proposed method, we first found a hyperplane in the latent space of StyleGAN to separate high and low memorable images. We then modified the image memorability (while maintaining the identity and other facial features such as age, emotion, etc.) by moving in the positive or negative direction of this hyperplane normal vector. We further analyzed how different layers of the StyleGAN augmented latent space contribute to face memorability. These analyses showed how each individual face attribute makes an image more or less memorable. Most importantly, we evaluated our proposed method for both real and unreal (generated) face images. The proposed method successfully modifies and controls the memorability of real human faces as well as unreal (generated) faces. Our proposed method can be employed in photograph editing applications for social media, learning aids, or advertisement purposes

    Data analytics for image visual complexity and kinect-based videos of rehabilitation exercises

    Full text link
    With the recent advances in computer vision and pattern recognition, methods from these fields are successfully applied to solve problems in various domains, including health care and social sciences. In this thesis, two such problems, from different domains, are discussed. First, an application of computer vision and broader pattern recognition in physical therapy is presented. Home-based physical therapy is an essential part of the recovery process in which the patient is prescribed specific exercises in order to improve symptoms and daily functioning of the body. However, poor adherence to the prescribed exercises is a common problem. In our work, we explore methods for improving home-based physical therapy experience. We begin by proposing DyAd, a dynamically difficulty adjustment system which captures the trajectory of the hand movement, evaluates the user's performance quantitatively and adjusts the difficulty level for the next trial of the exercise based on the performance measurements. Next, we introduce ExerciseCheck, a remote monitoring and evaluation platform for home-based physical therapy. ExerciseCheck is capable of capturing exercise information, evaluating the performance, providing therapeutic feedback to the patient and the therapist, checking the progress of the user over the course of the physical therapy, and supporting the patient throughout this period. In our experiments, Parkinson patients have tested our system at a clinic and in their homes during their physical therapy period. Our results suggests that ExerciseCheck is a user-friendly application and can assist patients by providing motivation, and guidance to ensure correct execution of the required exercises. As the second application, and within computer vision paradigm, we focus on visual complexity, an image attribute that humans can subjectively evaluate based on the level of details in the image. Visual complexity has been studied in psychophysics, cognitive science, and, more recently, computer vision, for the purposes of product design, web design, advertising, etc. We first introduce a diverse visual complexity dataset which compromises of seven image categories. We collect the ground-truth scores by comparing the pairwise relationship of images and then convert the pairwise scores to absolute scores using mathematical methods. Furthermore, we propose a method to measure the visual complexity that uses unsupervised information extraction from intermediate convolutional layers of deep neural networks. We derive an activation energy metric that combines convolutional layer activations to quantify visual complexity. The high correlations between ground-truth labels and computed energy scores in our experiments show superiority of our method compared to the previous works. Finally, as an example of the relationship between visual complexity and other image attributes, we demonstrate that, within the context of a category, visually more complex images are more memorable to human observers
    • …
    corecore