Search CORE

43 research outputs found

Contextual models for object detection using boosted random fields

Author: Freeman William T.
Murphy Kevin P.
Torralba Antonio
Publication venue
Publication date: 01/01/2004
Field of study

We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes

CiteSeerX

DSpace@MIT

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

Author: Aytar Yusuf
Castrejon Lluis
Pirsiavash Hamed
Torralba Antonio
Vondrick Carl
Publication venue
Publication date: 01/06/2016
Field of study

People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize cross-modal scenes well, they also learn an intermediate representation not aligned across modalities, which is undesirable for cross-modal transfer applications. We present methods to regularize cross-modal convolutional neural networks so that they have a shared representation that is agnostic of the modality. Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval. Moreover, our visualizations suggest that units emerge in the shared representation that tend to activate on consistent concepts independently of the modality.Comment: Conference paper at CVPR 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

SIFT and color feature fusion using localized maximum-margin learning for scene classification

Author: Qin J
Yung NHC
Publication venue: IEEE.
Publication date: 01/01/2010
Field of study

published_or_final_versionThe 3rd International Conference on Machine Vision (ICMV 2010), Hong Kong, China, 28-30 December 2010. In Proceedings of 3rd ICMV, 2010, p. 56-6

HKU Scholars Hub

Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]

Author: Anandkumar Animashree
Arabshahi Forough
Butts Carter T.
Fitshugh Sean M.
Huang Furong
Publication venue
Publication date: 01/11/2015
Field of study

Scalable probabilistic modeling and prediction in high dimensional multivariate time-series is a challenging problem, particularly for systems with hidden sources of dependence and/or homogeneity. Examples of such problems include dynamic social networks with co-evolving nodes and edges and dynamic student learning in online courses. Here, we address these problems through the discovery of hierarchical latent groups. We introduce a family of Conditional Latent Tree Models (CLTM), in which tree-structured latent variables incorporate the unknown groups. The latent tree itself is conditioned on observed covariates such as seasonality, historical activity, and node attributes. We propose a statistically efficient framework for learning both the hierarchical tree structure and the parameters of the CLTM. We demonstrate competitive performance in multiple real world datasets from different domains. These include a dataset on students' attempts at answering questions in a psychology MOOC, Twitter users participating in an emergency management discussion and interacting with one another, and windsurfers interacting on a beach in Southern California. In addition, our modeling framework provides valuable and interpretable information about the hidden group structures and their effect on the evolution of the time series

arXiv.org e-Print Archive

Crossref

Caltech Authors

Scene categorization with multi-scale category-specific visual words

Author: Qin J
Yung NHC
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2009
Field of study

IS&T/SPIE Conference on Intelligent Robots and Computer Vision XXVI: Algorithms and TechniquesIn this paper, we propose a scene categorization method based on multi-scale category-specific visual words. The proposed method quantizes visual words in a multi-scale manner which combines the global-feature-based and local-feature-based scene categorization approaches into a uniform framework. Unlike traditional visual word creation methods which quantize visual words from the whole training images without considering their categories, we form visual words from the training images grouped in different categories then collate the visual words from different categories to form the final codebook. This category-specific strategy provides us with more discriminative visual words for scene categorization. Based on the codebook, we compile a feature vector that encodes the presence of different visual words to represent a given image. A SVM classifier with linear kernel is then employed to select the features and classify the images. The proposed method is evaluated over two scene classification datasets of 6,447 images altogether using 10-fold cross-validation. The results show that the classification accuracy has been improved significantly comparing with the methods using the traditional visual words. And the proposed method is comparable to the best results published in the previous literatures in terms of classification accuracy rate and has the advantage in terms of simplicity. © 2009 SPIE-IS&T.published_or_final_versio

HKU Scholars Hub