85 research outputs found

    Collecting, Analyzing and Predicting Socially-Driven Image Interestingness

    Get PDF
    International audienceInterestingness has recently become an emerging concept for visual content assessment. However, understanding and predicting image interestingness remains challenging as its judgment is highly subjective and usually context-dependent. In addition, existing datasets are quite small for in-depth analysis. To push forward research in this topic, a large-scale interestingness dataset (images and their associated metadata) is described in this paper and released for public use. We then propose computational models based on deep learning to predict image interestingness. We show that exploiting relevant contextual information derived from social metadata could greatly improve the prediction results. Finally we discuss some key findings and potential research directions for this emerging topic

    Image Memorability Prediction with Vision Transformers

    Full text link
    Behavioral studies have shown that the memorability of images is similar across groups of people, suggesting that memorability is a function of the intrinsic properties of images, and is unrelated to people's individual experiences and traits. Deep learning networks can be trained on such properties and be used to predict memorability in new data sets. Convolutional neural networks (CNN) have pioneered image memorability prediction, but more recently developed vision transformer (ViT) models may have the potential to yield even better predictions. In this paper, we present the ViTMem, a new memorability model based on ViT, and evaluate memorability predictions obtained by it with state-of-the-art CNN-derived models. Results showed that ViTMem performed equal to or better than state-of-the-art models on all data sets. Additional semantic level analyses revealed that ViTMem is particularly sensitive to the semantic content that drives memorability in images. We conclude that ViTMem provides a new step forward, and propose that ViT-derived models can replace CNNs for computational prediction of image memorability. Researchers, educators, advertisers, visual designers and other interested parties can leverage the model to improve the memorability of their image material

    Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.

    Get PDF
    The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels because human annotators are much better at ranking two images/videos (e.g. which one is more interesting) than giving an absolute value to each of them separately. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. Differing from existing methods, the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. Extensive experiments on various benchmark datasets demonstrate that our new approach significantly outperforms state-of-the-arts alternatives.Comment: 14 pages, accepted by IEEE TPAM

    ResMem-Net: memory based deep CNN for image memorability estimation

    Get PDF
    Image memorability is a very hard problem in image processing due to its subjective nature. But due to the introduction of Deep Learning and the large availability of data and GPUs, great strides have been made in predicting the memorability of an image. In this paper, we propose a novel deep learning architecture called ResMem-Net that is a hybrid of LSTM and CNN that uses information from the hidden layers of the CNN to compute the memorability score of an image. The intermediate layers are important for predicting the output because they contain information about the intrinsic properties of the image. The proposed architecture automatically learns visual emotions and saliency, shown by the heatmaps generated using the GradRAM technique. We have also used the heatmaps and results to analyze and answer one of the most important questions in image memorability: ‘‘What makes an image memorable?“. The model is trained and evaluated using the publicly available Large-scale Image Memorability dataset (LaMem) from MIT. The results show that the model achieves a rank correlation of 0.679 and a mean squared error of 0.011, which is better than the current state-of-the-art models and is close to human consistency (p = 0.68). The proposed architecture also has a significantly low number of parameters compared to the state-of-the-art architecture, making it memory efficient and suitable for production

    Automatsko povećanje pamtljivosti slika

    Get PDF
    The dissertation considers the problem of automatic increase of image memorability. The problem-solving approach is based on editing-byapplying-filters paradigm. Given an arbitrary input image, the proposed deep learning model is able to automatically retrieve a set of “style seeds”, i.e., a set of style images which, applied to the input image through a neural style transfer algorithm, provide the highest increase in memorability. We show the effectiveness of the approach with experiments, performing both a quantitative evaluation and a user study.Дисертација разматра проблем аутоматског повећања памтљивости фотографије на основу модела дубоког учења. Овој проблематици се приступа са аспекта развоја иновативног приступа заснованог на парадигми уређивања слике применом филтера. Арбитрарна улазна слика аутоматски преузима сет стилских карактеристика који се преносе путем алгоритма неуронског стила, омогућавајући на овај начин пораст памтљивости целокупне слике. Ефикасност предложеног приступа евалуирана је експерименталнo уз изведбу корисничке студије.Disertacija razmatra problem automatskog povećanja pamtljivosti fotografije na osnovu modela dubokog učenja. Ovoj problematici se pristupa sa aspekta razvoja inovativnog pristupa zasnovanog na paradigmi uređivanja slike primenom filtera. Arbitrarna ulazna slika automatski preuzima set stilskih karakteristika koji se prenose putem algoritma neuronskog stila, omogućavajući na ovaj način porast pamtljivosti celokupne slike. Efikasnost predloženog pristupa evaluirana je eksperimentalno uz izvedbu korisničke studije

    A computational approach to the art of visual storytelling

    Get PDF
    For millennia, humanity as been using images to tell stories. In modern society, these visual narratives take the center stage in many different contexts, from illustrated children’s books to news media and comic books. They leverage the power of compounding various images in sequence to present compelling and informative narratives, in an immediate and impactful manner. In order to create them, many criteria are taken into account, from the quality of the individual images to how they synergize with one another. With the rise of the Internet, visual content with which to create these visual storylines is now in abundance. In areas such as news media, where visual storylines are regularly used to depict news stories, this has both advantages and disadvantages. Although content might be available online to create a visual storyline, filtering the massive amounts of existing images for high quality, relevant ones is a hard and time consuming task. Furthermore, combining these images into visually and semantically cohesive narratives is a highly skillful process and one that takes time. As a first step to help solve this problem, this thesis brings state of the art computational methodologies to the age old tradition of creating visual storylines. Leveraging these methodologies, we define a three part architecture to help with the creation of visual storylines in the context of news media, using social media content. To ensure the quality of the storylines from a human perception point of view, we deploy methods for filtering and raking images according to news quality standards, we resort to multimedia retrieval techniques to find relevant content and we propose a machine learning based approach to organize visual content into cohesive and appealing visual narratives
    corecore