Search CORE

2,261 research outputs found

Exploiting visual saliency for assessing the impact of car commercials upon viewers

Author: Díaz de María Fernando
Fernández Martínez Fernando
Fernández Torres Miguel Ángel
Garcia Faura Alvaro
González Díaz Iván
Hernández García Alejandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2018
Field of study

Content based video indexing and retrieval (CBVIR) is a lively area of research which focuses on automating the indexing, retrieval and management of videos. This area has a wide spectrum of promising applications where assessing the impact of audiovisual productions emerges as a particularly interesting and motivating one. In this paper we present a computational model capable to predict the impact (i.e. positive or negative) upon viewers of car advertisements videos by using a set of visual saliency descriptors. Visual saliency provides information about parts of the image perceived as most important, which are instinctively targeted by humans when looking at a picture or watching a video. For this reason we propose to exploit visual information, introducing it as a new feature which reflects high-level semantics objectively, to improve the video impact categorization results. The suggested salience descriptors are inspired by the mechanisms that underlie the attentional abilities of the human visual system and organized into seven distinct families according to different measurements over the identified salient areas in the video frames, namely population, size, location, geometry, orientation, movement and photographic composition. Proposed approach starts by computing saliency maps for all the video frames, where two different visual saliency detection frameworks have been considered and evaluated: the popular graph based visual saliency (GBVS) algorithm, and a state-of-the-art DNN-based approach.This work has been partially supported by the National Grants RTC-2016-5305-7 and TEC2014-53390-P of the Spanish Ministry of Economy and Competitiveness.Publicad

Universidad Carlos III de Madrid e-Archivo

Representations and representation learning for image aesthetics prediction and image enhancement

Author: Kucer Michal
Publication venue: RIT Scholar Works
Publication date: 01/04/2020
Field of study

With the continual improvement in cell phone cameras and improvements in the connectivity of mobile devices, we have seen an exponential increase in the images that are captured, stored and shared on social media. For example, as of July 1st 2017 Instagram had over 715 million registered users which had posted just shy of 35 billion images. This represented approximately seven and nine-fold increase in the number of users and photos present on Instagram since 2012. Whether the images are stored on personal computers or reside on social networks (e.g. Instagram, Flickr), the sheer number of images calls for methods to determine various image properties, such as object presence or appeal, for the purpose of automatic image management and curation. One of the central problems in consumer photography centers around determining the aesthetic appeal of an image and motivates us to explore questions related to understanding aesthetic preferences, image enhancement and the possibility of using such models on devices with constrained resources. In this dissertation, we present our work on exploring representations and representation learning approaches for aesthetic inference, composition ranking and its application to image enhancement. Firstly, we discuss early representations that mainly consisted of expert features, and their possibility to enhance Convolutional Neural Networks (CNN). Secondly, we discuss the ability of resource-constrained CNNs, and the different architecture choices (inputs size and layer depth) in solving various aesthetic inference tasks: binary classification, regression, and image cropping. We show that if trained for solving fine-grained aesthetics inference, such models can rival the cropping performance of other aesthetics-based croppers, however they fall short in comparison to models trained for composition ranking. Lastly, we discuss our work on exploring and identifying the design choices in training composition ranking functions, with the goal of using them for image composition enhancement

RIT Scholar Works

Media aesthetics based multimedia storytelling.

Author: Obrador Espinosa Pere
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

Since the earliest of times, humans have been interested in recording their life experiences, for future reference and for storytelling purposes. This task of recording experiences --i.e., both image and video capture-- has never before in history been as easy as it is today. This is creating a digital information overload that is becoming a great concern for the people that are trying to preserve their life experiences. As high-resolution digital still and video cameras become increasingly pervasive, unprecedented amounts of multimedia, are being downloaded to personal hard drives, and also uploaded to online social networks on a daily basis. The work presented in this dissertation is a contribution in the area of multimedia organization, as well as automatic selection of media for storytelling purposes, which eases the human task of summarizing a collection of images or videos in order to be shared with other people. As opposed to some prior art in this area, we have taken an approach in which neither user generated tags nor comments --that describe the photographs, either in their local or on-line repositories-- are taken into account, and also no user interaction with the algorithms is expected. We take an image analysis approach where both the context images --e.g. images from online social networks to which the image stories are going to be uploaded--, and the collection images --i.e., the collection of images or videos that needs to be summarized into a story--, are analyzed using image processing algorithms. This allows us to extract relevant metadata that can be used in the summarization process. Multimedia-storytellers usually follow three main steps when preparing their stories: first they choose the main story characters, the main events to describe, and finally from these media sub-groups, they choose the media based on their relevance to the story as well as based on their aesthetic value. Therefore, one of the main contributions of our work has been the design of computational models --both regression based, as well as classification based-- that correlate well with human perception of the aesthetic value of images and videos. These computational aesthetics models have been integrated into automatic selection algorithms for multimedia storytelling, which are another important contribution of our work. A human centric approach has been used in all experiments where it was feasible, and also in order to assess the final summarization results, i.e., humans are always the final judges of our algorithms, either by inspecting the aesthetic quality of the media, or by inspecting the final story generated by our algorithms. We are aware that a perfect automatically generated story summary is very hard to obtain, given the many subjective factors that play a role in such a creative process; rather, the presented approach should be seen as a first step in the storytelling creative process which removes some of the ground work that would be tedious and time consuming for the user. Overall, the main contributions of this work can be capitalized in three: (1) new media aesthetics models for both images and videos that correlate with human perception, (2) new scalable multimedia collection structures that ease the process of media summarization, and finally, (3) new media selection algorithms that are optimized for multimedia storytelling purposes.Postprint (published version

Multimodal Computational Attention for Scene Understanding

Author: Schauerte Boris
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

Probabilistic framework for image understanding applications using Bayesian Networks

Author: Jaber Mustafa
Publication venue: RIT Scholar Works
Publication date: 01/12/2011
Field of study

Machine learning algorithms have been successfully utilized in various systems/devices. They have the ability to improve the usability/quality of such systems in terms of intelligent user interface, fast performance, and more importantly, high accuracy. In this research, machine learning techniques are used in the field of image understanding, which is a common research area between image analysis and computer vision, to involve higher processing level of a target image to make sense of the scene captured in it. A general probabilistic framework for image understanding where topics associated with (i) collection of images to generate a comprehensive and valid database, (ii) generation of an unbiased ground-truth for the aforesaid database, (iii) selection of classification features and elimination of the redundant ones, and (iv) usage of such information to test a new sample set, are discussed. Two research projects have been developed as examples of the general image understanding framework; identification of region(s) of interest, and image segmentation evaluation. These techniques, in addition to others, are combined in an object-oriented rendering system for printing applications. The discussion included in this doctoral dissertation explores the means for developing such a system from an image understanding/ processing aspect. It is worth noticing that this work does not aim to develop a printing system. It is only proposed to add some essential features for current printing pipelines to achieve better visual quality while printing images/photos. Hence, we assume that image regions have been successfully extracted from the printed document. These images are used as input to the proposed object-oriented rendering algorithm where methodologies for color image segmentation, region-of-interest identification and semantic features extraction are employed. Probabilistic approaches based on Bayesian statistics have been utilized to develop the proposed image understanding techniques

RIT Scholar Works

Recommended from our members

Composition-guided image acquisition

Author: Banerjee Serene
Publication venue
Publication date: 01/01/2004
Field of study

textTo make a picture more appealing, professional photographers apply a wealth of photographic composition rules, of which amateur photographers are of- ten unaware. This dissertation aims at providing in-camera feedback to the amateur photographer while taking pictures. The proposed algorithms do not depend on prior knowledge of the indoor/outdoor setting or scene, and are amenable to software implementation on fixed-point programmable digital signal processors available in digital still cameras. The key enabling step in automating photographic composition rules is to locate the main subject. Digital still image acquisition maps the 3-D world onto a 2-D picture. By using the 2-D picture alone, segmenting the main subject without prior knowledge of the scene is ill-posed. Even with prior knowledge, segmentation is often computationally intensive and error prone. This dissertation defends the idea that reliable main subject segmenta- tion without prior knowledge of scene and setting may be achieved by acquiring a single picture, in which the optical system blurs objects not in the plane of focus. After segmentation, photographic composition rules may be automated. In this context, segmentation only needs to approximately and not precisely locate the main subject. In this dissertation, I combine optical and digital image processing to perform the segmentation of the main subject without prior knowledge of the scene. In particular, I propose to acquire a picture in which the main subject is in focus, and the shutter aperture is fully open. The lens optics will blur any object not in the plane of focus. For the acquired picture, I develop a computationally simple one-pass algorithm to segment the main subject. The post segmentation objective is to automate selected photographic composition rules. The algorithms can either be applied on the picture taken with the objects not in the plane of focus blurred, or on a user-intended picture with the same focal length settings. This way, in-camera feedback can be provided to the amateur photographer, in the form of alternate compositions of the same scene. I automate three photographic composition rules: (1) placement of the main subject obeying the rule-of-thirds, (2) background blurring to simulate the main subject being in motion or decrease the depth-of-field of the picture, and (3) merger detection and mitigation when equally focused main subject and background objects merge as one object. The primary contributions of the dissertation are in digital still image processing. The first is the automation of segmentation of the main subject in a single still picture assisted by optical pre-processing. The second is the automation of main subject placement, artistic background blur, and merger detection and mitigation to try to improve photographic composition.Electrical and Computer Engineerin

Texas ScholarWorks

Computational Media Aesthetics for Media Synthesis

Author: XIANG YANGYANG
Publication venue
Publication date: 15/08/2013
Field of study

Ph.DDOCTOR OF PHILOSOPH