Measuring and Predicting Importance of Objects in Our Visual World


Associating keywords with images automatically is an approachable and useful goal for visual recognition researchers. Keywords are distinctive and informative objects. We argue that keywords need to be sorted by 'importance', which we define as the probability of being mentioned first by an observer. We propose a method for measuring the `importance' of words using the object labels that multiple human observers give an everyday scene photograph. We model object naming as drawing balls from an urn, and fit this model to estimate `importance'; this combines order and frequency, enabling precise prediction under limited human labeling. We explore the relationship between the importance of an object in a particular image and the area, centrality, and saliency of the corresponding image patches. Furthermore, our data shows that many words are associated with even simple environments, and that few frequently appearing objects are shared across environments

    Similar works