14 research outputs found

    Hausdorff-Distance Enhanced Matching of Scale Invariant Feature Transform Descriptors in Context of Image Querying

    Get PDF
    Reliable and effective matching of visual descriptors is a key step for many vision applications, e.g. image retrieval. In this paper, we propose to integrate the Hausdorff distance matching together with our pairing algorithm, in order to obtain a robust while computationally efficient process of matching feature descriptors for image-to-image querying in standards datasets. For this purpose, Scale Invariant Feature Transform (SIFT) descriptors have been matched using our presented algorithm, followed by the computation of our related similarity measure. This approach has shown excellent performance in both retrieval accuracy and speed

    A histogram-based approach for object-based query-by-shape-and-color in image and video databases

    Get PDF
    Cataloged from PDF version of article.Considering the fact that querying by low-level object features is essential in image and video data, an efficient approach for querying and retrieval by shape and color is proposed. The approach employs three specialized histograms, (i.e. distance, angle, and color histograms) to store feature-based information that is extracted from objects. The objects can be extracted from images or video frames. The proposed histogram-based approach is used as a component in the query-by-feature subsystem of a video database management system. The color and shape information is handled together to enrich the querying capabilities for content-based retrieval. The evaluation of the retrieval effectiveness and the robustness of the proposed approach is presented via performance experiments. (C) 2005 Elsevier Ltd All rights reserved

    A histogram-based approach for object-based query-by-shape-and-color in image and video databases

    Get PDF
    Considering the fact that querying by low-level object features is essential in image and video data, an efficient approach for querying and retrieval by shape and color is proposed. The approach employs three specialized histograms, (i.e. distance, angle, and color histograms) to store feature-based information that is extracted from objects. The objects can be extracted from images or video frames. The proposed histogram-based approach is used as a component in the query-by-feature subsystem of a video database management system. The color and shape information is handled together to enrich the querying capabilities for content-based retrieval. The evaluation of the retrieval effectiveness and the robustness of the proposed approach is presented via performance experiments. © 2005 Elsevier Ltd. All rights reserved

    Color Spatial Arrangement for Image Retrieval by Visual Similarity

    Get PDF

    An Exploration of MPEG-7 Shape Descriptors

    Get PDF
    The Multimedia Content Description Interface (ISO/IEC 15938), commonly known to as MPEG-7, became a standard as of September of 2001. Unlike its predecessors, MPEG- 7 standardizes multimedia metadata description. By providing robust descriptors and an effective system for storing them, MPEG-7 is designed to provide a means of navigation through audio-visual content. In particular, MPEG-7 provides two two-dimensional shape descriptors, the Angular Radial Transform (ART) and Curvature Scaled Space (CSS), for use in image and video annotation and retrieval. Field Programmable Gate Arrays (FPGAs) have a very general structure and are made up of programmable switches that allow the end-user, rather than the manufacturer, to configure these switches for whatever design is needed by their application. This flexibly has led to the use of FPGAs for prototyping and implementing circuit designs as well as their use being suggesting as part of reconfigurable computing. For this work, an FPGA based ART extractor was designed and simulated for a Xilinx Virtex-E XCV300e in order to provide a speedup over software based extraction. The design created is capable of processing over 69,4400 pixels a minute. This design utilizes 99% of the FPGA\u27s logical resources and operates at a clock rate of 25 MHz. Along with the proposed design, the MPEG-7 shape descriptors were explored as to how well they retrieved similar objects and how these objects matched up to what a human would expect. Results showed that the majority of the retrievals made using the MPEG-7 shape descriptors returned visually acceptable results. It should be noted that even the human results had a high amount of variance. Finally, this thesis briefly explored the potential of utilizing the ART descriptor for optical character recognition (OCR) in the context of image retrieval from databases. It was demonstrated that the ART has potential for use in OCR, however there is still research to be performed in this area

    Object detection and activity recognition in digital image and video libraries

    Get PDF
    This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates. The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties. The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients

    ImplementaciĂłn y evaluaciĂłn de un sistema QbE-STD (Query-by-Example Spoken Term Detection)

    Full text link
    Con el fin de extraer información y reconocer palabras clave en los ficheros de audio presentes en medios de comunicación e Internet, surgen los sistemas QbE-STD (Query-by- Example Spoken Term Detecion). Los sistemas QbE-STD tratan, por un lado de buscar un ejemplo de un objeto o parte de él en otro objeto (QbE), y por otro de encontrar palabras o secuencias de ellas en archivos de audio (STD). En este Trabajo Fin de Máster se ha desarrollado un sistema QbE-STD independiente del idioma cuya entrada o query está basada en términos hablados, lo que permite a un usuario realizar una búsqueda en un repositorio de audio emitiendo con su voz el término a buscar. Como técnica de representación del habla se han empleado los llamados posteriorgramas fonéticos, obtenidos mediante los decodificadores fonéticos desarrollados por la Universidad de Tecnología de Brno (BUT). Para la detección de los términos de búsqueda en los repositorios de audio se ha utilizado el algoritmo Subsequence Dynamic Time Warping (S-DTW). Además de desarrollar un sistema QbE-STD que sirva como punto de partida para futuras vías de trabajo del grupo AUDIAS1, se han incluido distintas técnicas y aportaciones con el objetivo de intentar mejorar los resultados obtenidos. Entre estas técnicas se encuentra la selección de unidades fonéticas o la fusión de idiomas. Para el desarrollo de la solución y la realización de las pruebas se han utilizado los audios pertenecientes a las evaluaciones Albayzin 2016 y 2018 Search on Speech. Los resultados obtenidos se han podido contrastar con otros sistemas publicados, ya que para el cálculo de la precisión se ha empleado un procedimiento de evaluación oficial propuesto por el instituto de tecnología NIST y ampliamente utilizado. Los valores de precisión alcanzados demuestran que mediante el sistema básico se obtienen unos resultados competitivos y semejantes a los de otras implementaciones de este tipo.In order to extract information and recognize key words in the audio files belonging to media and Internet, QbE-STD (Query-by-Example Spoken Term Detection) systems are developed. QbE-STD systems have as purpose, on the one hand, to search for an example of an object or part of it in another object (QbE), and on the other, to find words or sequences of them in audio files (STD). In this Master Thesis, a language-independent QbE-STD system has been developed, whose input or query is based on spoken terms, which allows an user to perform a search in an audio repository by saying the search term with his/her own voice. As a technique of speech representation, phonetic posteriorgrams have been used, obtained through the phonetic decoders developed by the Brno University of Technology (BUT). The Subsequence Dynamic Time Warping (S-DTW) algorithm has been used to detect the search terms in the audio repositories. In addition to developing a QbE-STD system that will be used as a first point for future investigation of AUDIAS2 group, different techniques and contributions have been included in order to try to improve the achieved results. Among these techniques, the phonetic units selection or the languages fusion have been implemented. In the development and test phases, the audios belonging to the Albayzin 2016 and 2018 Search on Speech evaluation have been used. The achieved results have been compared with other published systems, because of the use of an official evaluation procedure proposed by NIST technology has been implemented to obtain accuracy. The precision values obtained show that competitive results have been achieved through the basic system, and these are similar to those of other implementations of this type

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N
    corecore