2,458 research outputs found

    Ambient Sound Provides Supervision for Visual Learning

    Full text link
    The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.Comment: ECCV 201

    Image retrieval with hierarchical matching pursuit

    Full text link
    A novel representation of images for image retrieval is introduced in this paper, by using a new type of feature with remarkable discriminative power. Despite the multi-scale nature of objects, most existing models perform feature extraction on a fixed scale, which will inevitably degrade the performance of the whole system. Motivated by this, we introduce a hierarchical sparse coding architecture for image retrieval to explore multi-scale cues. Sparse codes extracted on lower layers are transmitted to higher layers recursively. With this mechanism, cues from different scales are fused. Experiments on the Holidays dataset show that the proposed method achieves an excellent retrieval performance with a small code length.Comment: 5 pages, 6 figures, conferenc

    Salient object subitizing

    Full text link
    We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA. (0910908 - US NSF; 1029430 - US NSF)https://arxiv.org/abs/1607.07525https://arxiv.org/pdf/1607.07525.pdfAccepted manuscrip
    • …
    corecore