1,773 research outputs found

    An Exploration of MPEG-7 Shape Descriptors

    Get PDF
    The Multimedia Content Description Interface (ISO/IEC 15938), commonly known to as MPEG-7, became a standard as of September of 2001. Unlike its predecessors, MPEG- 7 standardizes multimedia metadata description. By providing robust descriptors and an effective system for storing them, MPEG-7 is designed to provide a means of navigation through audio-visual content. In particular, MPEG-7 provides two two-dimensional shape descriptors, the Angular Radial Transform (ART) and Curvature Scaled Space (CSS), for use in image and video annotation and retrieval. Field Programmable Gate Arrays (FPGAs) have a very general structure and are made up of programmable switches that allow the end-user, rather than the manufacturer, to configure these switches for whatever design is needed by their application. This flexibly has led to the use of FPGAs for prototyping and implementing circuit designs as well as their use being suggesting as part of reconfigurable computing. For this work, an FPGA based ART extractor was designed and simulated for a Xilinx Virtex-E XCV300e in order to provide a speedup over software based extraction. The design created is capable of processing over 69,4400 pixels a minute. This design utilizes 99% of the FPGA\u27s logical resources and operates at a clock rate of 25 MHz. Along with the proposed design, the MPEG-7 shape descriptors were explored as to how well they retrieved similar objects and how these objects matched up to what a human would expect. Results showed that the majority of the retrievals made using the MPEG-7 shape descriptors returned visually acceptable results. It should be noted that even the human results had a high amount of variance. Finally, this thesis briefly explored the potential of utilizing the ART descriptor for optical character recognition (OCR) in the context of image retrieval from databases. It was demonstrated that the ART has potential for use in OCR, however there is still research to be performed in this area

    The aceToolbox: low-level audiovisual feature extraction for retrieval and classification

    Get PDF
    In this paper we present an overview of a software platform that has been developed within the aceMedia project, termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 eXperimental Model (XM), with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images

    Techniques for effective and efficient fire detection from social media images

    Get PDF
    Social media could provide valuable information to support decision making in crisis management, such as in accidents, explosions and fires. However, much of the data from social media are images, which are uploaded in a rate that makes it impossible for human beings to analyze them. Despite the many works on image analysis, there are no fire detection studies on social media. To fill this gap, we propose the use and evaluation of a broad set of content-based image retrieval and classification techniques for fire detection. Our main contributions are: (i) the development of the Fast-Fire Detection method (FFDnR), which combines feature extractor and evaluation functions to support instance-based learning, (ii) the construction of an annotated set of images with ground-truth depicting fire occurrences -- the FlickrFire dataset, and (iii) the evaluation of 36 efficient image descriptors for fire detection. Using real data from Flickr, our results showed that FFDnR was able to achieve a precision for fire detection comparable to that of human annotators. Therefore, our work shall provide a solid basis for further developments on monitoring images from social media.Comment: 12 pages, Proceedings of the International Conference on Enterprise Information Systems. Specifically: Marcos Bedo, Gustavo Blanco, Willian Oliveira, Mirela Cazzolato, Alceu Costa, Jose Rodrigues, Agma Traina, Caetano Traina, 2015, Techniques for effective and efficient fire detection from social media images, ICEIS, 34-4

    Learning midlevel image features for natural scene and texture classification

    Get PDF
    This paper deals with coding of natural scenes in order to extract semantic information. We present a new scheme to project natural scenes onto a basis in which each dimension encodes statistically independent information. Basis extraction is performed by independent component analysis (ICA) applied to image patches culled from natural scenes. The study of the resulting coding units (coding filters) extracted from well-chosen categories of images shows that they adapt and respond selectively to discriminant features in natural scenes. Given this basis, we define global and local image signatures relying on the maximal activity of filters on the input image. Locally, the construction of the signature takes into account the spatial distribution of the maximal responses within the image. We propose a criterion to reduce the size of the space of representation for faster computation. The proposed approach is tested in the context of texture classification (111 classes), as well as natural scenes classification (11 categories, 2037 images). Using a common protocol, the other commonly used descriptors have at most 47.7% accuracy on average while our method obtains performances of up to 63.8%. We show that this advantage does not depend on the size of the signature and demonstrate the efficiency of the proposed criterion to select ICA filters and reduce the dimensio

    Efficient contour-based shape representation and matching

    Get PDF
    This paper presents an efficient method for calculating the similarity between 2D closed shape contours. The proposed algorithm is invariant to translation, scale change and rotation. It can be used for database retrieval or for detecting regions with a particular shape in video sequences. The proposed algorithm is suitable for real-time applications. In the first stage of the algorithm, an ordered sequence of contour points approximating the shapes is extracted from the input binary images. The contours are translation and scale-size normalized, and small sets of the most likely starting points for both shapes are extracted. In the second stage, the starting points from both shapes are assigned into pairs and rotation alignment is performed. The dissimilarity measure is based on the geometrical distances between corresponding contour points. A fast sub-optimal method for solving the correspondence problem between contour points from two shapes is proposed. The dissimilarity measure is calculated for each pair of starting points. The lowest dissimilarity is taken as the final dissimilarity measure between two shapes. Three different experiments are carried out using the proposed approach: letter recognition using a web camera, our own simulation of Part B of the MPEG-7 core experiment “CE-Shape1” and detection of characters in cartoon video sequences. Results indicate that the proposed dissimilarity measure is aligned with human intuition
    corecore