5 research outputs found

    A Thousand Words in a Scene

    Get PDF
    This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a text-like \emph{bag-of-visterms} representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and (3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multi-class scene classification tasks using a 9500-image data set, that the \emph{bag-of-visterms} representation consistently outperforms classical scene classification approaches. In other data sets we show that our approach competes with or outperforms other recent, more complex, methods. We also show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and more robust than the \emph{bag-of-visterms} representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections

    Learning the structure of image collections with latent aspect models

    Get PDF
    The approach to indexing an image collection depends on the type of data to organize. Satellite images are likely to be searched with latitude and longitude coordinates, medical images are often searched with an image example that serves as a visual query, and personal image collections are generally browsed by event. A more general retrieval scenario is based on the use of textual keywords to search for images containing a specific object, or representing a given scene type. This requires the manual annotation of each image in the collection to allow for the retrieval of relevant visual information based on a text query. This time-consuming and subjective process is the current price to pay for a reliable and convenient text-based image search. This dissertation investigates the use of probabilistic models to assist the automatic organization of image collections, attempting to link the visual content of digital images with a potential textual description. Relying on robust, patch-based image representations that have proven to capture a variety of visual content, our work proposes to model images as mixtures of \emph{latent aspects}. These latent aspects are defined by multinomial distributions that capture patch co-occurrence information observed in the collection. An image is not represented by the direct count of its constituting elements, but as a mixture of latent aspects that can be estimated with principled, generative unsupervised learning methods. An aspect-based image representation therefore incorporates contextual information from the whole collection that can be exploited. This emerging concept is explored for several fundamental tasks related to image retrieval - namely classification, clustering, segmentation, and annotation - in what represents one of the first coherent and comprehensive study of the subject. We first investigate the possibility of classifying images based on their estimated aspect mixture weights, interpreting latent aspect modeling as an unsupervised feature extraction process. Several image categorization tasks are considered, where images are classified based on the present objects or according to their global scene type. We demonstrate that the concept of latent aspects allows to take advantage of non-labeled data to infer a robust image representation that achieves a higher classification performance than the original patch-based representation. Secondly, further exploring the concept, we show that aspects can correspond to an interesting soft clustering of an image collection that can serve as a browsing structure. Images can be ranked given an aspect, illustrating the corresponding co-occurrence context visually. In the third place, we derive a principled method that relies on latent aspects to classify image patches into different categories. This produces an image segmentation based on the resulting spatial class-densities. We finally propose to model images and their caption with a single aspect model, merging the co-occurrence contexts of the visual and the textual modalities in different ways. Once a model has been learned, the distribution of words given an unseen image is inferred based on its visual representation, and serves as textual indexing. Overall, we demonstrate with extensive experiments that the co-occurrence context captured by latent aspects is suitable for the above mentioned tasks, making it a promising approach for multimedia indexing

    A Knowledge-based Approach for Creating Detailed Landscape Representations by Fusing GIS Data Collections with Associated Uncertainty

    Get PDF
    Geographic Information Systems (GIS) data for a region is of different types and collected from different sources, such as aerial digitized color imagery, elevation data consisting of terrain height at different points in that region, and feature data consisting of geometric information and properties about entities above/below the ground in that region. Merging GIS data and understanding the real world information present explicitly or implicitly in that data is a challenging task. This is often done manually by domain experts because of their superior capability to efficiently recognize patterns, combine, reason, and relate information. When a detailed digital representation of the region is to be created, domain experts are required to make best-guess decisions about each object. For example, a human would create representations of entities by collectively looking at the data layers, noting even elements that are not visible, like a covered overpass or underwater tunnel of a certain width and length. Such detailed representations are needed for use by processes like visualization or 3D modeling in applications used by military, simulation, earth sciences and gaming communities. Many of these applications are increasingly using digitally synthesized visuals and require detailed digital 3D representations to be generated quickly after acquiring the necessary initial data. Our main thesis, and a significant research contribution of this work, is that this task of creating detailed representations can be automated to a very large extent using a methodology which first fuses all Geographic Information System (GIS) data sources available into knowledge base (KB) assertions (instances) representing real world objects using a subprocess called GIS2KB. Then using reasoning, implicit information is inferred to define detailed 3D entity representations using a geometry definition engine called KB2Scene. Semantic Web is used as the semantic inferencing system and is extended with a data extraction framework. This framework enables the extraction of implicit property information using data and image analysis techniques. The data extraction framework supports extraction of spatial relationship values and attribution of uncertainties to inferred details. Uncertainty is recorded per property and used under Zadeh fuzzy semantics to compute a resulting uncertainty for inferred assertional axioms. This is achieved by another major contribution of our research, a unique extension of the KB ABox Realization service using KB explanation services. Previous semantics based research in this domain has concentrated more on improving represented details through the addition of artifacts like lights, signage, crosswalks, etc. Previous attempts regarding uncertainty in assertions use a modified reasoner expressivity and calculus. Our work differs in that separating formal knowledge from data processing allows fusion of different heterogeneous data sources which share the same context. Imprecision is modeled through uncertainty on assertions without defining a new expressivity as long as KB explanation services are available for the used expressivity. We also believe that in our use case, this simplifies uncertainty calculations. The uncertainties are then available for user-decision at output. We show that the process of creating 3D visuals from GIS data sources can be more automated, modular, verifiable, and the knowledge base instances available for other applications to use as part of a common knowledge base. We define our method’s components, discuss advantages and limitations, and show sample results for the transportation domain

    A picture is worth a thousand words : content-based image retrieval techniques

    Get PDF
    In my dissertation I investigate techniques for improving the state of the art in content-based image retrieval. To place my work into context, I highlight the current trends and challenges in my field by analyzing over 200 recent articles. Next, I propose a novel paradigm called __artificial imagination__, which gives the retrieval system the power to imagine and think along with the user in terms of what she is looking for. I then introduce a new user interface for visualizing and exploring image collections, empowering the user to navigate large collections based on her own needs and preferences, while simultaneously providing her with an accurate sense of what the database has to offer. In the later chapters I present work dealing with millions of images and focus in particular on high-performance techniques that minimize memory and computational use for both near-duplicate image detection and web search. Finally, I show early work on a scene completion-based image retrieval engine, which synthesizes realistic imagery that matches what the user has in mind.LEI Universiteit LeidenNWOImagin
    corecore