1,435 research outputs found

    A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

    Full text link
    Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. First, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the learned hidden topic features and show how to employ it to learn a joint representation from image visual words, annotation words and class label information. We test our model on the LabelMe and UIUC-Sports data sets and show that it compares favorably to other topic models. Second, we propose a deep extension of our model and provide an efficient way of training the deep model. Experimental results show that our deep model outperforms its shallow version and reaches state-of-the-art performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug 4th, 2015. Add footnote about how to train the model in practice in Section 5.1. arXiv admin note: substantial text overlap with arXiv:1305.530

    The ImageCLEF 2013 Plant Identification Task

    Get PDF
    International audienceThe ImageCLEF's plant identification task provides a testbed for a system-oriented evaluation of plant identification about 250 species trees and herbaceous plants based on detailed views of leaves, flowers, fruits, stems and bark or some entire views of the plants. Two types of image content are considered: SheetAsBackgroud which contains only leaves in a front of a generally white uniform background, and NaturalBackground which contains the 5 kinds of detailed views with unconstrained conditions, directly photographed on the plant. The main originality of this data is that it was specifically built through a citizen sciences initiative conducted by Tela Botanica, a French social network of amateur and expert botanists. This makes the task closer to the conditions of a real-world application. This overview presents more precisely the resources and assessments of task, summarizes the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results. With a total of twelve groups from nine countries and with a total of thirty three runs submitted, involving distinct and original methods, this third year task confirms Image Retrieval community interest for biodiversity and botany, and highlights further challenging studies in plant identification

    Estimating snow cover from publicly available images

    Get PDF
    In this paper we study the problem of estimating snow cover in mountainous regions, that is, the spatial extent of the earth surface covered by snow. We argue that publicly available visual content, in the form of user generated photographs and image feeds from outdoor webcams, can both be leveraged as additional measurement sources, complementing existing ground, satellite and airborne sensor data. To this end, we describe two content acquisition and processing pipelines that are tailored to such sources, addressing the specific challenges posed by each of them, e.g., identifying the mountain peaks, filtering out images taken in bad weather conditions, handling varying illumination conditions. The final outcome is summarized in a snow cover index, which indicates for a specific mountain and day of the year, the fraction of visible area covered by snow, possibly at different elevations. We created a manually labelled dataset to assess the accuracy of the image snow covered area estimation, achieving 90.0% precision at 91.1% recall. In addition, we show that seasonal trends related to air temperature are captured by the snow cover index.Comment: submitted to IEEE Transactions on Multimedi

    Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

    Get PDF
    We present a principled framework for inferring pixel labels in weakly-annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems, namely image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state-of-the-art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search

    Contour Detection-based Discovery of Mid-level Discriminative Patches for Scene Classification

    Get PDF
    Feature extraction and representation is a key step in scene classification. In this paper, a contour detection-based mid-level features learning method is proposed for scene classification. First, a sketch tokens-based contour detection scheme is proposed to initialize seed blocks for learning mid-level patches and the patches with more contour pixels are selected as seed blocks. The procedure is demonstrated to be helpful for scene classification. Next, the seed blocks are employed to train an exemplar SVM to discover other similar occurrences and an entropy-rank criterion is utilized to mine the discriminative patches. Finally, scene categories are identified by matching the discriminative patches and testing images. Extensive experiments on the MIT Indoor-67 dataset, the 15-scene dataset and the UIUC-sports dataset show that the proposed approach yields better performance than other state-of-the-art counterparts

    The Wetting of Leaf Surfaces and Its Ecological Significances

    Get PDF
    Leaf wettability, indicating the affinity for water on leaf surfaces, is a common phenomenon for plants in a wide variety of habitats. The contact angle (θ) of water on leaves measured at the gas, solid and liquid interface is an index of surface wettability. Leaves are termed as “super-hydrophilic” if θ 110°, the leaves are classified as being non-wettable, while θ > 130° for highly non-wettable and θ > 150° for super-hydrophobic. Both internal and external factors can influence leaf wettability. The chemical composition and structure of leaf surfaces are internal causes, but the external environment can also influence wettability by affecting the structure and composition of the surface. The main internal factors that affecting leaf wettability include the content and microstructure of the epidermal wax, the number, size and pattern of trichomes, stomatal density, the shape of epidermal cells, and leaf water status. The leaf contact angles increased with the increasing of leaf wax content. However, studies have shown that the contact angles were more dependent on the complexity of wax structure than on the absolute amount. For trichomes, there are three types of interaction between trichomes and water droplets, including (1) low trichomes density: no apparent influence of trichomes on the location of surface moisture, droplet formation and retention ; (2) medium trichomes density: trichomes appear to circle surface moisture into patches; (3) high trichomes density: trichomes appear to hold water droplets above the trichomes. In some cases, a higher stomatal density was accompanied with a higher contact angles. While, it was also observed that there was no significant correlation between contact angle and stomatal density for some species. For the effects of epidermal cells on leaf wettability, it was generally considered that the combination of a dense layer of surface wax and the convex epidermal cells was what created a hydrophobic leaf surface. However, the influence of leaf water content on contact angle of water droplets on different leaf surfaces was complex, e.g., contact angles increased with decreasing of leaf water content, contact angle remained to be constant with different leaf water content

    A Tree-Based Context Model for Object Recognition

    Get PDF
    There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on datasets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new dataset with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typicalscenes in a dataset.This research was partially funded by Shell International Exploration and Production Inc., by Army Research Office under award W911NF-06-1-0076, by NSF Career Award (ISI 0747120), and by the Air Force Office of Scientific Research under Award No.FA9550-06-1-0324. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the Air Force

    Tiny images

    Get PDF
    The human visual system is remarkably tolerant to degradations in image resolution: in a scene recognition task, human performance is similar whether 32×3232 \times 32 color images or multi-mega pixel images are used. With small images, even object recognition and segmentation is performed robustly by the visual system, despite the object being unrecognizable in isolation. Motivated by these observations, we explore the space of 32x32 images using a database of 10^8 32x32 color images gathered from the Internet using image search engines. Each image is loosely labeled with one of the 70,399 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database represents a dense sampling of all object categories and scenes. With this dataset, we use nearest neighbor methods to perform objectrecognition across the 10^8 images
    corecore