43 research outputs found

    Contextual models for object detection using boosted random fields

    Get PDF
    We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes

    Learning Aligned Cross-Modal Representations from Weakly Aligned Data

    Get PDF
    People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize cross-modal scenes well, they also learn an intermediate representation not aligned across modalities, which is undesirable for cross-modal transfer applications. We present methods to regularize cross-modal convolutional neural networks so that they have a shared representation that is agnostic of the modality. Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval. Moreover, our visualizations suggest that units emerge in the shared representation that tend to activate on consistent concepts independently of the modality.Comment: Conference paper at CVPR 201

    SIFT and color feature fusion using localized maximum-margin learning for scene classification

    Get PDF
    published_or_final_versionThe 3rd International Conference on Machine Vision (ICMV 2010), Hong Kong, China, 28-30 December 2010. In Proceedings of 3rd ICMV, 2010, p. 56-6

    Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]

    Get PDF
    Scalable probabilistic modeling and prediction in high dimensional multivariate time-series is a challenging problem, particularly for systems with hidden sources of dependence and/or homogeneity. Examples of such problems include dynamic social networks with co-evolving nodes and edges and dynamic student learning in online courses. Here, we address these problems through the discovery of hierarchical latent groups. We introduce a family of Conditional Latent Tree Models (CLTM), in which tree-structured latent variables incorporate the unknown groups. The latent tree itself is conditioned on observed covariates such as seasonality, historical activity, and node attributes. We propose a statistically efficient framework for learning both the hierarchical tree structure and the parameters of the CLTM. We demonstrate competitive performance in multiple real world datasets from different domains. These include a dataset on students' attempts at answering questions in a psychology MOOC, Twitter users participating in an emergency management discussion and interacting with one another, and windsurfers interacting on a beach in Southern California. In addition, our modeling framework provides valuable and interpretable information about the hidden group structures and their effect on the evolution of the time series

    Scene categorization with multi-scale category-specific visual words

    Get PDF
    IS&T/SPIE Conference on Intelligent Robots and Computer Vision XXVI: Algorithms and TechniquesIn this paper, we propose a scene categorization method based on multi-scale category-specific visual words. The proposed method quantizes visual words in a multi-scale manner which combines the global-feature-based and local-feature-based scene categorization approaches into a uniform framework. Unlike traditional visual word creation methods which quantize visual words from the whole training images without considering their categories, we form visual words from the training images grouped in different categories then collate the visual words from different categories to form the final codebook. This category-specific strategy provides us with more discriminative visual words for scene categorization. Based on the codebook, we compile a feature vector that encodes the presence of different visual words to represent a given image. A SVM classifier with linear kernel is then employed to select the features and classify the images. The proposed method is evaluated over two scene classification datasets of 6,447 images altogether using 10-fold cross-validation. The results show that the classification accuracy has been improved significantly comparing with the methods using the traditional visual words. And the proposed method is comparable to the best results published in the previous literatures in terms of classification accuracy rate and has the advantage in terms of simplicity. © 2009 SPIE-IS&T.published_or_final_versio
    corecore