2,420 research outputs found

    Detecting the presence of large buildings in natural images

    Get PDF
    This paper addresses the issue of classification of lowlevel features into high-level semantic concepts for the purpose of semantic annotation of consumer photographs. We adopt a multi-scale approach that relies on edge detection to extract an edge orientation-based feature description of the image, and apply an SVM learning technique to infer the presence of a dominant building object in a general purpose collection of digital photographs. The approach exploits prior knowledge on the image context through an assumption that all input images are ïżœoutdoorïżœ, i.e. indoor/outdoor classification (the context determination stage) has been performed. The proposed approach is validated on a diverse dataset of 1720 images and its performance compared with that of the MPEG-7 edge histogram descriptor

    Ridgelet-based signature for natural image classification

    Get PDF
    This paper presents an approach to grouping natural scenes into (semantically) meaningful categories. The proposed approach exploits the statistics of natural scenes to define relevant image categories. A ridgelet-based signature is used to represent images. This signature is used by a support vector classifier that is well designed to support high dimensional features, resulting in an effective recognition system. As an illustration of the potential of the approach several experiments of binary classifications (e.g. city/landscape or indoor/outdoor) are conducted on databases of natural scenes

    Pre-classification for automatic image orientation

    Get PDF
    In this paper, we propose a novel method for automatic orientation of digital images. The approach is based on exploiting the properties of local statistics of natural scenes. In this way, we address some of the difficulties encountered in previous works in this area. The main contribution of this paper is to introduce a pre-classification step into carefully defined categories in order to simplify subsequent orientation detection. The proposed algorithm was tested on 9068 images and compared to existing state of the art in the area. Results show a significant improvement over previous work

    Image metadata estimation using independent component analysis and regression

    Get PDF
    In this paper, we describe an approach to camera metadata estimation using regression based on Independent Component Analysis (ICA). Semantic scene classification of images using camera metadata related to capture conditions has had some success in the past. However, different makes and models of camera capture different types of metadata and this severely hampers the application of this kind of approach in real systems that consist of photos captured by many different users. We propose to address this issue by using regression to predict the missing metadata from observed data, thereby providing more complete (and hence more useful) metadata for the entire image corpus. The proposed approach uses an ICA based approach to regression

    Learning midlevel image features for natural scene and texture classification

    Get PDF
    This paper deals with coding of natural scenes in order to extract semantic information. We present a new scheme to project natural scenes onto a basis in which each dimension encodes statistically independent information. Basis extraction is performed by independent component analysis (ICA) applied to image patches culled from natural scenes. The study of the resulting coding units (coding filters) extracted from well-chosen categories of images shows that they adapt and respond selectively to discriminant features in natural scenes. Given this basis, we define global and local image signatures relying on the maximal activity of filters on the input image. Locally, the construction of the signature takes into account the spatial distribution of the maximal responses within the image. We propose a criterion to reduce the size of the space of representation for faster computation. The proposed approach is tested in the context of texture classification (111 classes), as well as natural scenes classification (11 categories, 2037 images). Using a common protocol, the other commonly used descriptors have at most 47.7% accuracy on average while our method obtains performances of up to 63.8%. We show that this advantage does not depend on the size of the signature and demonstrate the efficiency of the proposed criterion to select ICA filters and reduce the dimensio

    Relating visual and semantic image descriptors

    Get PDF
    This paper addresses the automatic analysis of visual content and extraction of metadata beyond pure visual descriptors. Two approaches are described: Automatic Image Annotation (AIA) and Confidence Clustering (CC). AIA attempts to automatically classify images based on two binary classifiers and is designed for the consumer electronics domain. Contrastingly, the CC approach does not attempt to assign a unique label to images but rather to organise the database based on concepts

    The aceToolbox: low-level audiovisual feature extraction for retrieval and classification

    Get PDF
    In this paper we present an overview of a software platform that has been developed within the aceMedia project, termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 eXperimental Model (XM), with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images

    Exploiting context information to aid landmark detection in SenseCam images

    Get PDF
    In this paper, we describe an approach designed to exploit context information in order to aid the detection of landmark images from a large collection of photographs. The photographs were generated using Microsoft’s SenseCam, a device designed to passively record a visual diary and cover a typical day of the user wearing the camera. The proliferation of digital photos along with the associated problems of managing and organising these collections provide the background motivation for this work. We believe more ubiquitious cameras, such as SenseCam, will become the norm in the future and the management of the volume of data generated by such devices is a key issue. The goal of the work reported here is to use context information to assist in the detection of landmark images or sequences of images from the thousands of photos taken daily by SenseCam. We will achieve this by analysing the images using low-level MPEG-7 features along with metadata provided by SenseCam, followed by simple clustering to identify the landmark images

    Fusing MPEG-7 visual descriptors for image classification

    Get PDF
    This paper proposes three content-based image classification techniques based on fusing various low-level MPEG-7 visual descriptors. Fusion is necessary as descriptors would be otherwise incompatible and inappropriate to directly include e.g. in a Euclidean distance. Three approaches are described: A “merging” fusion combined with an SVM classifier, a back-propagation fusion combined with a KNN classifier and a Fuzzy-ART neurofuzzy network. In the latter case, fuzzy rules can be extracted in an effort to bridge the “semantic gap” between the low-level descriptors and the high-level semantics of an image. All networks were evaluated using content from the repository of the aceMedia project1 and more specifically in a beach/urban scene classification problem

    Coherent segmentation of video into syntactic regions

    Get PDF
    In this paper we report on our work in realising an approach to video shot matching which involves automatically segmenting video into abstract intertwinded shapes in such a way that there is temporal coherency. These shapes representing approximations of objects and background regions can then be matched giving fine-grained shot-shot matching. The main contributions of the paper are firstly the extension of our segmentation algorithm for still images to spatial segmentation in video, and secondly the introduction a measurement of temporal coherency of the spatial segmentation. This latter allows us to quantitatively demonstrate the effectiveness of our approach on real video data
    • 

    corecore