440 research outputs found

    AMNet: Memorability Estimation with Attention

    Get PDF
    In this paper we present the design and evaluation of an end-to-end trainable, deep neural network with a visual attention mechanism for memorability estimation in still images. We analyze the suitability of transfer learning of deep models from image classification to the memorability task. Further on we study the impact of the attention mechanism on the memorability estimation and evaluate our network on the SUN Memorability and the LaMem datasets. Our network outperforms the existing state of the art models on both datasets in terms of the Spearman's rank correlation as well as the mean squared error, closely matching human consistency

    Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview

    Get PDF
    This paper provides an overview of some of the most relevant deep learning approaches to pattern extraction and recognition in visual arts, particularly painting and drawing. Recent advances in deep learning and computer vision, coupled with the growing availability of large digitized visual art collections, have opened new opportunities for computer science researchers to assist the art community with automatic tools to analyse and further understand visual arts. Among other benefits, a deeper understanding of visual arts has the potential to make them more accessible to a wider population, ultimately supporting the spread of culture

    ResMem-Net: memory based deep CNN for image memorability estimation

    Get PDF
    Image memorability is a very hard problem in image processing due to its subjective nature. But due to the introduction of Deep Learning and the large availability of data and GPUs, great strides have been made in predicting the memorability of an image. In this paper, we propose a novel deep learning architecture called ResMem-Net that is a hybrid of LSTM and CNN that uses information from the hidden layers of the CNN to compute the memorability score of an image. The intermediate layers are important for predicting the output because they contain information about the intrinsic properties of the image. The proposed architecture automatically learns visual emotions and saliency, shown by the heatmaps generated using the GradRAM technique. We have also used the heatmaps and results to analyze and answer one of the most important questions in image memorability: ‘‘What makes an image memorable?“. The model is trained and evaluated using the publicly available Large-scale Image Memorability dataset (LaMem) from MIT. The results show that the model achieves a rank correlation of 0.679 and a mean squared error of 0.011, which is better than the current state-of-the-art models and is close to human consistency (p = 0.68). The proposed architecture also has a significantly low number of parameters compared to the state-of-the-art architecture, making it memory efficient and suitable for production

    Predicting and Modifying Memorability of Images

    Get PDF
    Everyday, we are bombarded with many photographs of faces, whether on social media, television, or smartphones. From an evolutionary perspective, faces are intended to be remembered, mainly due to survival and personal relevance. However, all these faces do not have the equal opportunity to stick in our minds. It has been shown that memorability is an intrinsic feature of an image but yet, it is largely unknown what attributes make an image more memorable. In this work, we first proposed new models for predicting memorability of face and object images. Subsequently, we proposed a fast approach to modify and control the memorability of face images. In our proposed method, we first found a hyperplane in the latent space of StyleGAN to separate high and low memorable images. We then modified the image memorability (while maintaining the identity and other facial features such as age, emotion, etc.) by moving in the positive or negative direction of this hyperplane normal vector. We further analyzed how different layers of the StyleGAN augmented latent space contribute to face memorability. These analyses showed how each individual face attribute makes an image more or less memorable. Most importantly, we evaluated our proposed method for both real and unreal (generated) face images. The proposed method successfully modifies and controls the memorability of real human faces as well as unreal (generated) faces. Our proposed method can be employed in photograph editing applications for social media, learning aids, or advertisement purposes

    Investigating human-perceptual properties of "shapes" using 3D shapes and 2D fonts

    Get PDF
    Shapes are generally used to convey meaning. They are used in video games, films and other multimedia, in diverse ways. 3D shapes may be destined for virtual scenes or represent objects to be constructed in the real-world. Fonts add character to an otherwise plain block of text, allowing the writer to make important points more visually prominent or distinct from other text. They can indicate the structure of a document, at a glance. Rather than studying shapes through traditional geometric shape descriptors, we provide alternative methods to describe and analyse shapes, from a lens of human perception. This is done via the concepts of Schelling Points and Image Specificity. Schelling Points are choices people make when they aim to match with what they expect others to choose but cannot communicate with others to determine an answer. We study whole mesh selections in this setting, where Schelling Meshes are the most frequently selected shapes. The key idea behind image Specificity is that different images evoke different descriptions; but ‘Specific’ images yield more consistent descriptions than others. We apply Specificity to 2D fonts. We show that each concept can be learned and predict them for fonts and 3D shapes, respectively, using a depth image-based convolutional neural network. Results are shown for a range of fonts and 3D shapes and we demonstrate that font Specificity and the Schelling meshes concept are useful for visualisation, clustering, and search applications. Overall, we find that each concept represents similarities between their respective type of shape, even when there are discontinuities between the shape geometries themselves. The ‘context’ of these similarities is in some kind of abstract or subjective meaning which is consistent among different people

    Data analytics for image visual complexity and kinect-based videos of rehabilitation exercises

    Full text link
    With the recent advances in computer vision and pattern recognition, methods from these fields are successfully applied to solve problems in various domains, including health care and social sciences. In this thesis, two such problems, from different domains, are discussed. First, an application of computer vision and broader pattern recognition in physical therapy is presented. Home-based physical therapy is an essential part of the recovery process in which the patient is prescribed specific exercises in order to improve symptoms and daily functioning of the body. However, poor adherence to the prescribed exercises is a common problem. In our work, we explore methods for improving home-based physical therapy experience. We begin by proposing DyAd, a dynamically difficulty adjustment system which captures the trajectory of the hand movement, evaluates the user's performance quantitatively and adjusts the difficulty level for the next trial of the exercise based on the performance measurements. Next, we introduce ExerciseCheck, a remote monitoring and evaluation platform for home-based physical therapy. ExerciseCheck is capable of capturing exercise information, evaluating the performance, providing therapeutic feedback to the patient and the therapist, checking the progress of the user over the course of the physical therapy, and supporting the patient throughout this period. In our experiments, Parkinson patients have tested our system at a clinic and in their homes during their physical therapy period. Our results suggests that ExerciseCheck is a user-friendly application and can assist patients by providing motivation, and guidance to ensure correct execution of the required exercises. As the second application, and within computer vision paradigm, we focus on visual complexity, an image attribute that humans can subjectively evaluate based on the level of details in the image. Visual complexity has been studied in psychophysics, cognitive science, and, more recently, computer vision, for the purposes of product design, web design, advertising, etc. We first introduce a diverse visual complexity dataset which compromises of seven image categories. We collect the ground-truth scores by comparing the pairwise relationship of images and then convert the pairwise scores to absolute scores using mathematical methods. Furthermore, we propose a method to measure the visual complexity that uses unsupervised information extraction from intermediate convolutional layers of deep neural networks. We derive an activation energy metric that combines convolutional layer activations to quantify visual complexity. The high correlations between ground-truth labels and computed energy scores in our experiments show superiority of our method compared to the previous works. Finally, as an example of the relationship between visual complexity and other image attributes, we demonstrate that, within the context of a category, visually more complex images are more memorable to human observers

    Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.

    Get PDF
    The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels because human annotators are much better at ranking two images/videos (e.g. which one is more interesting) than giving an absolute value to each of them separately. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. Differing from existing methods, the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. Extensive experiments on various benchmark datasets demonstrate that our new approach significantly outperforms state-of-the-arts alternatives.Comment: 14 pages, accepted by IEEE TPAM

    Recall and post-trip evaluation of tourist destinations: the effects of travel order

    Get PDF
    Samira Zare explored the role of heuristic biases involved in recall and evaluation of tourists' destinations. She found that the first and last cities in a sequence are recalled and evaluated better than the middle destinations. She provided a foundation for future studies about order effects in tourism and hospitality

    Biopsychosocial Assessment and Ergonomics Intervention for Sustainable Living: A Case Study on Flats

    Get PDF
    This study proposes an ergonomics-based approach for those who are living in small housings (known as flats) in Indonesia. With regard to human capability and limitation, this research shows how the basic needs of human beings are captured and analyzed, followed by proposed designs of facilities and standard living in small housings. Ninety samples were involved during the study through in- depth interview and face-to-face questionnaire. The results show that there were some proposed of modification of critical facilities (such as multifunction ironing work station, bed furniture, and clothesline) and validated through usability testing. Overall, it is hoped that the proposed designs will support biopsychosocial needs and sustainability

    Automatsko povećanje pamtljivosti slika

    Get PDF
    The dissertation considers the problem of automatic increase of image memorability. The problem-solving approach is based on editing-byapplying-filters paradigm. Given an arbitrary input image, the proposed deep learning model is able to automatically retrieve a set of “style seeds”, i.e., a set of style images which, applied to the input image through a neural style transfer algorithm, provide the highest increase in memorability. We show the effectiveness of the approach with experiments, performing both a quantitative evaluation and a user study.Дисертација разматра проблем аутоматског повећања памтљивости фотографије на основу модела дубоког учења. Овој проблематици се приступа са аспекта развоја иновативног приступа заснованог на парадигми уређивања слике применом филтера. Арбитрарна улазна слика аутоматски преузима сет стилских карактеристика који се преносе путем алгоритма неуронског стила, омогућавајући на овај начин пораст памтљивости целокупне слике. Ефикасност предложеног приступа евалуирана је експерименталнo уз изведбу корисничке студије.Disertacija razmatra problem automatskog povećanja pamtljivosti fotografije na osnovu modela dubokog učenja. Ovoj problematici se pristupa sa aspekta razvoja inovativnog pristupa zasnovanog na paradigmi uređivanja slike primenom filtera. Arbitrarna ulazna slika automatski preuzima set stilskih karakteristika koji se prenose putem algoritma neuronskog stila, omogućavajući na ovaj način porast pamtljivosti celokupne slike. Efikasnost predloženog pristupa evaluirana je eksperimentalno uz izvedbu korisničke studije
    corecore