2,320 research outputs found

    Automatic indoor/outdoor scene classification

    Get PDF
    The advent and wide acceptance of digital imaging technology has motivated an upsurge in research focused on managing the ever-growing number of digital images. Current research in image manipulation represents a general shift in the field of computer vision from traditional image analysis based on low-level features (e.g. color and texture) to semantic scene understanding based on high-level features (e.g. grass and sky). One particular area of investigation is scene categorization, where the organization of a large number of images is treated as a classification problem. Generally, the classification involves mapping a set of traditional low-level features to semantically meaningful categories, such as indoor and outdoor scenes, using a classifier engine. Successful indoor/outdoor scene categorization is beneficial to a number of image manipulation applications, as indoor and outdoor scenes represent among the most general scene types. In content-based image retrieval, for example, a query for a scene containing a sunset can be restricted to images in the database pre-categorized as outdoor scenes. Also, in image enhancement, categorization of a scene as indoor vs. outdoor can lead to improved color balancing and tone reproduction. Prior research in scene classification has shown that high-level information can, in fact, be inferred from low-level image features. Classification rates of roughly 90% have been reported using low-level features to predict indoor scenes vs. outdoor scenes. However, the high classification rates are often achieved by using computationally expensive, high-dimensional feature sets, thus limiting the practical implementation of such systems. To address this problem, a low complexity, low-dimensional feature set was extracted in a variety of configurations in the work presented here. Due to their excellent generalization performance, Support Vector Machines (SVMs) were used to manage the tradeoff between reduced dimensionality and increased classification accuracy. It was determined that features extracted from image subblocks, as opposed to the full image, can yield better classification rates when combined in a second stage. In particular, applying SVMs in two stages led to an indoor/outdoor classification accuracy of 90.2% on a large database of consumer photographs provided by Kodak. Finally, it was also shown that low-level and semantic features can be integrated efficiently using Bayesian networks for increased accuracy. Specifically, the integration of grass and sky semantic features with color and texture low-level features increased the indoor/outdoor classification rate to 92.8% on the same database of images

    Detecting the presence of large buildings in natural images

    Get PDF
    This paper addresses the issue of classification of lowlevel features into high-level semantic concepts for the purpose of semantic annotation of consumer photographs. We adopt a multi-scale approach that relies on edge detection to extract an edge orientation-based feature description of the image, and apply an SVM learning technique to infer the presence of a dominant building object in a general purpose collection of digital photographs. The approach exploits prior knowledge on the image context through an assumption that all input images are �outdoor�, i.e. indoor/outdoor classification (the context determination stage) has been performed. The proposed approach is validated on a diverse dataset of 1720 images and its performance compared with that of the MPEG-7 edge histogram descriptor

    Indoor Outdoor Scene Classification in Digital Images

    Get PDF
    In this paper, we present a method to classify real-world digital images into indoor and outdoor scenes. Indoor class consists of four groups: bedroom, kitchen, laboratory and library. Outdoor class consists of four groups: landscape, roads, buildings and garden. Application considers real-time system and has a dedicated data-set. Input images are pre-processed and converted into gray-scale and is re-sized to “128x128” dimensions. Pre-processed images are sent to “Gabor filters”, which pre-computes filter transfer functions, which are performed on Fourier domain. The processed signal is finally sent to GIST feature extraction and the images are classified using “kNN classifier”. Most of the techniques have been based on the use of texture and color space features. As of date, we have been able to achieve 80% accuracy with respect to image classification

    Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion

    Get PDF
    Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene. Inferring the presence of high-level semantic concepts from low-level visual features is a research topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene. In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures

    Image orientation detection using LBP-based features and logistic regression

    Get PDF
    open3noopenGianluigi Ciocca;Claudio Cusano;Raimondo SchettiniGianluigi, Ciocca; Cusano, Claudio; Raimondo, Schettin

    Fireground location understanding by semantic linking of visual objects and building information models

    Get PDF
    This paper presents an outline for improved localization and situational awareness in fire emergency situations based on semantic technology and computer vision techniques. The novelty of our methodology lies in the semantic linking of video object recognition results from visual and thermal cameras with Building Information Models (BIM). The current limitations and possibilities of certain building information streams in the context of fire safety or fire incident management are addressed in this paper. Furthermore, our data management tools match higher-level semantic metadata descriptors of BIM and deep-learning based visual object recognition and classification networks. Based on these matches, estimations can be generated of camera, objects and event positions in the BIM model, transforming it from a static source of information into a rich, dynamic data provider. Previous work has already investigated the possibilities to link BIM and low-cost point sensors for fireground understanding, but these approaches did not take into account the benefits of video analysis and recent developments in semantics and feature learning research. Finally, the strengths of the proposed approach compared to the state-of-the-art is its (semi -)automatic workflow, generic and modular setup and multi-modal strategy, which allows to automatically create situational awareness, to improve localization and to facilitate the overall fire understanding
    corecore