337,022 research outputs found

    What image features guide lightness perception?

    Get PDF
    Lightness constancy is the ability to perceive black and white surface colors under a wide range of lighting conditions. This fundamental visual ability is not well understood, and current theories differ greatly on what image features are important for lightness perception. Here we measured classification images for human observers and four models of lightness perception to determine which image regions influenced lightness judgments. The models were a high-pass-filter model, an oriented difference-of-Gaussians model, an anchoring model, and an atmospheric-link-function model. Human and model observers viewed three variants of the argyle illusion (Adelson, 1993) and judged which of two test patches appeared lighter. Classification images showed that human lightness judgments were based on local, anisotropic stimulus regions that were bounded by regions of uniform lighting. The atmospheric-link-function and anchoring models predicted the lightness illusion perceived by human observers, but the high-pass-filter and oriented-difference-of-Gaussians models did not. Furthermore, all four models produced classification images that were qualitatively different from those of human observers, meaning that the model lightness judgments were guided by different image regions than human lightness judgments. These experiments provide a new test of models of lightness perception, and show that human observers' lightness computations can be highly local, as in low-level models, and nevertheless depend strongly on lighting boundaries, as suggested by midlevel models.York University Librarie

    Audio Caption: Listen and Tell

    Full text link
    Increasing amount of research has shed light on machine perception of audio events, most of which concerns detection and classification tasks. However, human-like perception of audio scenes involves not only detecting and classifying audio sounds, but also summarizing the relationship between different audio events. Comparable research such as image caption has been conducted, yet the audio field is still quite barren. This paper introduces a manually-annotated dataset for audio caption. The purpose is to automatically generate natural sentences for audio scene description and to bridge the gap between machine perception of audio and image. The whole dataset is labelled in Mandarin and we also include translated English annotations. A baseline encoder-decoder model is provided for both English and Mandarin. Similar BLEU scores are derived for both languages: our model can generate understandable and data-related captions based on the dataset.Comment: accepted by ICASSP201

    Development of a perception oriented texture-based image retrieval system for wallpapers.

    Get PDF
    Due to advances in computer technology, large image collections have been digitised and archived in computers. Image management systems are therefore developed to retrieve relevant images. Because of the limitations of text-based image retrieval systems, Content-Based Image Retrieval (CBIR) systems have been developed. A CBIR system usually extracts global or local contents of colour, shape and texture from an image to form a feature vector that is used to index the image. Plethora methods have been developed to extract these features, however, there is very little in the literature to study the closeness of each method to human perception. This research aims to develop a human perception oriented content-based image retrieval system for the Museum of Domestic Design & Architecture (MoDA) wallpaper images. Since texture has been widely regarded as the main feature for these images and applied in CBIR systems, psychophysical experiments were conducted to study the way human perceive texture and to evaluate five popular computational models for texture representations: Grey Level Co-occurrence Matrices (GLCM), Multi-Resolution Simultaneous Auto-Regressive (MRSAR) model, Fourier Transform (FT), Wavelet Transform (WT) and Gabor Transform (GT). By analyzing experimental results, it was found that people consider directionality and regularity to be more important in terms of texture than coarseness. Unexpectedly, none of the five models appeared to represent human perception of texture very well. It was therefore concluded that classification is needed before retrieval in order to improve retrieval performance and a new classification algorithm based on directionality and regularity for wallpaper images was developed. The experimental result showed that the evaluation algorithm worked effectively and the evaluation experiments confirmed the necessity of the classification step in the development of CBIR system for MoDA collections

    Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis

    Full text link
    User preference profiling is an important task in modern online social networks (OSN). With the proliferation of image-centric social platforms, such as Pinterest, visual contents have become one of the most informative data streams for understanding user preferences. Traditional approaches usually treat visual content analysis as a general classification problem where one or more labels are assigned to each image. Although such an approach simplifies the process of image analysis, it misses the rich context and visual cues that play an important role in people's perception of images. In this paper, we explore the possibilities of learning a user's latent visual preferences directly from image contents. We propose a distance metric learning method based on Deep Convolutional Neural Networks (CNN) to directly extract similarity information from visual contents and use the derived distance metric to mine individual users' fine-grained visual preferences. Through our preliminary experiments using data from 5,790 Pinterest users, we show that even for the images within the same category, each user possesses distinct and individually-identifiable visual preferences that are consistent over their lifetime. Our results underscore the untapped potential of finer-grained visual preference profiling in understanding users' preferences.Comment: 2015 IEEE 15th International Conference on Data Mining Workshop

    Colour based semantic image segmentation and classification for unmanned ground operations

    Get PDF
    To aid an automatic taxiing system for unmanned aircraft, this paper presents a colour based method for semantic segmentation and image classification in an aerodrome environment with the intention to use the classification output to aid navigation and collision avoidance. Based on previous work, this machine vision system uses semantic segmentation to interpret the scene. Following an initial superpixel based segmentation procedure, a colour based Bayesian Network classifier is trained and used to semantically classify each segmented cluster. HSV colourspace is adopted as it is close to the way of human vision perception of the world, and each channel shows significant differentiation between classes. Luminance is used to identify surface lines on the taxiway, which is then fused with colour classification to give improved classification results. The classification performance of the proposed colour based classifier is tested in a real aerodrome, which demonstrates that the proposed method outperforms a previously developed texture only based method
    • …
    corecore