5 research outputs found

    Supporting personal photo storytelling for social albums

    Full text link

    Representations and representation learning for image aesthetics prediction and image enhancement

    Get PDF
    With the continual improvement in cell phone cameras and improvements in the connectivity of mobile devices, we have seen an exponential increase in the images that are captured, stored and shared on social media. For example, as of July 1st 2017 Instagram had over 715 million registered users which had posted just shy of 35 billion images. This represented approximately seven and nine-fold increase in the number of users and photos present on Instagram since 2012. Whether the images are stored on personal computers or reside on social networks (e.g. Instagram, Flickr), the sheer number of images calls for methods to determine various image properties, such as object presence or appeal, for the purpose of automatic image management and curation. One of the central problems in consumer photography centers around determining the aesthetic appeal of an image and motivates us to explore questions related to understanding aesthetic preferences, image enhancement and the possibility of using such models on devices with constrained resources. In this dissertation, we present our work on exploring representations and representation learning approaches for aesthetic inference, composition ranking and its application to image enhancement. Firstly, we discuss early representations that mainly consisted of expert features, and their possibility to enhance Convolutional Neural Networks (CNN). Secondly, we discuss the ability of resource-constrained CNNs, and the different architecture choices (inputs size and layer depth) in solving various aesthetic inference tasks: binary classification, regression, and image cropping. We show that if trained for solving fine-grained aesthetics inference, such models can rival the cropping performance of other aesthetics-based croppers, however they fall short in comparison to models trained for composition ranking. Lastly, we discuss our work on exploring and identifying the design choices in training composition ranking functions, with the goal of using them for image composition enhancement

    Media aesthetics based multimedia storytelling.

    Get PDF
    Since the earliest of times, humans have been interested in recording their life experiences, for future reference and for storytelling purposes. This task of recording experiences --i.e., both image and video capture-- has never before in history been as easy as it is today. This is creating a digital information overload that is becoming a great concern for the people that are trying to preserve their life experiences. As high-resolution digital still and video cameras become increasingly pervasive, unprecedented amounts of multimedia, are being downloaded to personal hard drives, and also uploaded to online social networks on a daily basis. The work presented in this dissertation is a contribution in the area of multimedia organization, as well as automatic selection of media for storytelling purposes, which eases the human task of summarizing a collection of images or videos in order to be shared with other people. As opposed to some prior art in this area, we have taken an approach in which neither user generated tags nor comments --that describe the photographs, either in their local or on-line repositories-- are taken into account, and also no user interaction with the algorithms is expected. We take an image analysis approach where both the context images --e.g. images from online social networks to which the image stories are going to be uploaded--, and the collection images --i.e., the collection of images or videos that needs to be summarized into a story--, are analyzed using image processing algorithms. This allows us to extract relevant metadata that can be used in the summarization process. Multimedia-storytellers usually follow three main steps when preparing their stories: first they choose the main story characters, the main events to describe, and finally from these media sub-groups, they choose the media based on their relevance to the story as well as based on their aesthetic value. Therefore, one of the main contributions of our work has been the design of computational models --both regression based, as well as classification based-- that correlate well with human perception of the aesthetic value of images and videos. These computational aesthetics models have been integrated into automatic selection algorithms for multimedia storytelling, which are another important contribution of our work. A human centric approach has been used in all experiments where it was feasible, and also in order to assess the final summarization results, i.e., humans are always the final judges of our algorithms, either by inspecting the aesthetic quality of the media, or by inspecting the final story generated by our algorithms. We are aware that a perfect automatically generated story summary is very hard to obtain, given the many subjective factors that play a role in such a creative process; rather, the presented approach should be seen as a first step in the storytelling creative process which removes some of the ground work that would be tedious and time consuming for the user. Overall, the main contributions of this work can be capitalized in three: (1) new media aesthetics models for both images and videos that correlate with human perception, (2) new scalable multimedia collection structures that ease the process of media summarization, and finally, (3) new media selection algorithms that are optimized for multimedia storytelling purposes.Postprint (published version

    Organising and structuring a visual diary using visual interest point detectors

    Get PDF
    As wearable cameras become more popular, researchers are increasingly focusing on novel applications to manage the large volume of data these devices produce. One such application is the construction of a Visual Diary from an individual’s photographs. Microsoft’s SenseCam, a device designed to passively record a Visual Diary and cover a typical day of the user wearing the camera, is an example of one such device. The vast quantity of images generated by these devices means that the management and organisation of these collections is not a trivial matter. We believe wearable cameras, such as SenseCam, will become more popular in the future and the management of the volume of data generated by these devices is a key issue. Although there is a significant volume of work in the literature in the object detection and recognition and scene classification fields, there is little work in the area of setting detection. Furthermore, few authors have examined the issues involved in analysing extremely large image collections (like a Visual Diary) gathered over a long period of time. An algorithm developed for setting detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. We present a number of approaches to setting detection based on the extraction of visual interest point detectors from the images. We also analyse the performance of two of the most popular descriptors - Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF).We present an implementation of a Visual Diary application and evaluate its performance via a series of user experiments. Finally, we also outline some techniques to allow the Visual Diary to automatically detect new settings, to scale as the image collection continues to grow substantially over time, and to allow the user to generate a personalised summary of their data

    Estimation automatique des impressions véhiculées par une photographie de visage

    Get PDF
    Picture selection is a time-consuming task for humans and a real challenge for machines, which have to retrieve complex and subjective information from image pixels. An automated system that infers human feelings from digital portraits would be of great help for profile picture selection, photo album creation or photo editing. In this work, several models of facial pictures evaluation are defined. The first one predicts the overall aesthetic quality of a facial image by computing 15 features that encode low-level statistics in different image regions (face, eyes and mouth). Relevant features are automatically selected by a feature ranking technique, and the outputs of 4 learning algorithms are fused in order to make a robust and accurate prediction of the image quality. Results are compared with recent works and the proposed algorithm obtains the best performance. The same pipeline is then considered to evaluate the likability and competence induced by a facial picture, with the difference that the estimation is based on high-level attributes such as gender, age and smile. Performance of these attributes is compared with previous techniques that mostly rely on facial keypoints positions, and it is shown that it is possible to obtain predictions that are close to human perception. Finally, a combination of both models that selects a likable facial image of good aesthetic quality for a given person is described.Avec le développement des appareils photos numériques et des sites de partage de photos, nous passons une part croissante de notre temps à observer, sélectionner et partager des images, parmi lesquelles figurent un grand nombre de photos de visage. Dans cette thèse, nous nous proposons de créer un premier système entièrement automatique renvoyant une estimation de la pertinence d'une photo de visage pour son utilisation dans la création d'un album de photos, la sélection de photos pour un réseau social ou professionnel, etc. Pour cela, nous créons plusieurs modèles d'estimation de la pertinence d'une photo de visage en fonction de son utilisation. Dans un premier temps, nous adaptons les modèles d'estimation de la qualité esthétique d'une photo au cas particulier des photos de visage. Nous montrons que le fait de calculer 15 caractéristiques décrivant différents aspects de l'image (texture, illumination, couleurs) dans des régions spécifiques de l'image (le visage, les yeux, la bouche) améliore significativement la précision des estimations par rapport aux modèles de l'état de l'art. La précision de ce modèle est renforcée par la sélection de caractéristiques adaptées à notre problème, ainsi que par la fusion des prédictions de 4 algorithmes d'apprentissage. Dans un second temps, nous proposons d'enrichir l'évaluation automatique d'une photo de visage en définissant des modèles d'estimation associés à des critères tels que le degré de sympathie ou de compétence dégagé par une photo de visage. Ces modèles reposent sur l'utilisation d'attributs de haut niveau (présence de sourire, ouverture des yeux, expressions faciales), qui se montrent plus efficaces que les caractéristiques de bas niveau utilisées dans l'état de l'art (filtres de Gabor, position des points de repère du visage). Enfin, nous fusionnons ces modèles afin de sélectionner automatiquement des photos de bonne qualité esthétique et appropriées à une utilisation donnée : photos inspirant de la sympathie à partager en famille, photos dégageant une impression de compétence sur un réseau professionnel
    corecore