688 research outputs found

    Hybrid image representation methods for automatic image annotation: a survey

    Get PDF
    In most automatic image annotation systems, images are represented with low level features using either global methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is beneficial in annotating images. In this paper, we provide a survey on automatic image annotation techniques according to one aspect: feature extraction, and, in order to complement existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation

    Automating the construction of scene classifiers for content-based video retrieval

    Get PDF
    This paper introduces a real time automatic scene classifier within content-based video retrieval. In our envisioned approach end users like documentalists, not image processing experts, build classifiers interactively, by simply indicating positive examples of a scene. Classification consists of a two stage procedure. First, small image fragments called patches are classified. Second, frequency vectors of these patch classifications are fed into a second classifier for global scene classification (e.g., city, portraits, or countryside). The first stage classifiers can be seen as a set of highly specialized, learned feature detectors, as an alternative to letting an image processing expert determine features a priori. We present results for experiments on a variety of patch and image classes. The scene classifier has been used successfully within television archives and for Internet porn filtering

    Scene Classification with a Biologically Inspired Method

    Get PDF
    We present a biologically motivated method for scene image classification. The core of the method is to use shape based image property that is provided by a hierarchical feedforward model of the visual cortex [18]. Edge based and color based image properties are additionally used to improve the accuracy. The method consists of two stages of image analysis. In the first stage, each of three paths of classification uses each image property (i.e. shape, edge or color based features) independently. In the second stage, a single classifier assigns the category of an image based on the probability distributions of the first stage classifier outputs. Experiments show that the method boosts the classification accuracy over the shape based model. We demonstrate that this method achieves a high accuracy comparable to other reported methods on publicly available color image dataset

    Global Depth Perception from Familiar Scene Structure

    Get PDF
    In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual "size" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the difficulty of the object recognition process. Here we propose a source of information for absolute depth estimation that does not rely on specific objects: we introduce a procedure for absolute depth estimation based on the recognition of the whole scene. The shape of the space of the scene and the structures present in the scene are strongly related to the scale of observation. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene, and therefore its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection

    Human-Centered Content-Based Image Retrieval

    Get PDF
    Retrieval of images that lack a (suitable) annotations cannot be achieved through (traditional) Information Retrieval (IR) techniques. Access through such collections can be achieved through the application of computer vision techniques on the IR problem, which is baptized Content-Based Image Retrieval (CBIR). In contrast with most purely technological approaches, the thesis Human-Centered Content-Based Image Retrieval approaches the problem from a human/user centered perspective. Psychophysical experiments were conducted in which people were asked to categorize colors. The data gathered from these experiments was fed to a Fast Exact Euclidean Distance (FEED) transform (Schouten & Van den Broek, 2004), which enabled the segmentation of color space based on human perception (Van den Broek et al., 2008). This unique color space segementation was exploited for texture analysis and image segmentation, and subsequently for full-featured CBIR. In addition, a unique CBIR-benchmark was developed (Van den Broek et al., 2004, 2005). This benchmark was used to explore what and how several parameters (e.g., color and distance measures) of the CBIR process influence retrieval results. In contrast with other research, users judgements were assigned as metric. The online IR and CBIR system Multimedia for Art Retrieval (M4ART) (URL: http://www.m4art.org) has been (partly) founded on the techniques discussed in this thesis. References: - Broek, E.L. van den, Kisters, P.M.F., and Vuurpijl, L.G. (2004). The utilization of human color categorization for content-based image retrieval. Proceedings of SPIE (Human Vision and Electronic Imaging), 5292, 351-362. [see also Chapter 7] - Broek, E.L. van den, Kisters, P.M.F., and Vuurpijl, L.G. (2005). Content-Based Image Retrieval Benchmarking: Utilizing Color Categories and Color Distributions. Journal of Imaging Science and Technology, 49(3), 293-301. [see also Chapter 8] - Broek, E.L. van den, Schouten, Th.E., and Kisters, P.M.F. (2008). Modeling Human Color Categorization. Pattern Recognition Letters, 29(8), 1136-1144. [see also Chapter 5] - Schouten, Th.E. and Broek, E.L. van den (2004). Fast Exact Euclidean Distance (FEED) transformation. In J. Kittler, M. Petrou, and M. Nixon (Eds.), Proceedings of the 17th IEEE International Conference on Pattern Recognition (ICPR 2004), Vol 3, p. 594-597. August 23-26, Cambridge - United Kingdom. [see also Appendix C

    Recognizing Indoor Scenes

    Get PDF
    We propose a scheme for indoor place identification based on the recognition of global scene views. Scene views are encoded using a holistic representation that provides low-resolution spatial and spectral information. The holistic nature of the representation dispenses with the need to rely on specific objects or local landmarks and also renders it robust against variations in object configurations. We demonstrate the scheme on the problem of recognizing scenes in video sequences captured while walking through an office environment. We develop a method for distinguishing between 'diagnostic' and 'generic' views and also evaluate changes in system performances as a function of the amount of training data available and the complexity of the representation

    Scene Determination based on Video and Audio Features

    Full text link
    Determination of scenes from a video is a challenging task. When asking humans for it, results will be inconsistent since the term scene is not precisely defined. It leaves it up to each human to set shared attributes which integrate shots to scenes. However, consistent results can be found for certain basic attributes like dialogs, same settings and continuing sounds. We have therefore developed a scene determination scheme which clusters shots based on detected dialogs, same settings and similar audio. Our experimental results show that automatic deter mination of these types of scenes can be performed reliably

    IDeixis : image-based deixis for recognizing locations

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 31-32).In this thesis, we describe an approach to recognizing location from camera-equipped mobile devices using image-based web search. This is an image-based deixis capable of pointing at a distant location away from the user's current location. We demonstrate our approach on an application allowing users to browse web pages matching the image of a nearby location. Common image search metrics can match images captured with a camera-equipped mobile device to images found on the World Wide Web. The users can recognize the location if those pages contain information about this location (e.g. name, facts, stories ... etc). Since the amount of information displayable on the device is limited, automatic keyword extraction methods can be applied to help efficiently identify relevant pieces of location information. Searching the entire web can be computationally overwhelming, so we devise a hybrid image-and-keyword searching technique. First, image-search is performed over images and links to their source web pages in a database that indexes only a small fraction of the web. Then, relevant keywords on these web pages are automatically identified and submitted to an existing text-based search engine (e.g. Google) that indexes a much larger portion of the web. Finally, the resulting image set is filtered to retain images close to the original query in terms of visual similarity. It is thus possible to efficiently search hundreds of millions of images that are not only textually related but also visually relevant.by Pei-Hsiu Yeh.S.M

    Mètode d'extracció multiparamètrica de característiques de textura orientat a la segmentació d'imatges

    Get PDF
    Tal com es veurà en el següent capítol d'antecedents, existeixen formes molt variades d'afrontar l'anàlisi de textures però cap d'elles està orientada al càlcul en temps real (video rate). Degut a la manca de mètodes que posin tant d'èmfasi en el temps de processat, l'objectiu d'aquesta tesi és definir i desenvolupar un nou mètode d'extracció de característiques de textura que treballi en temps real. Per aconseguir aquesta alta velocitat d'operació, un altre objectiu és presentar el disseny d'una arquitectura específica per implementar l'algorisme de càlcul dels paràmetres de textura definits, així com també l'algorisme de classificació dels paràmetres i la segmentació de la imatge en regions de textura semblant.En el capítol 2 s'expliquen els diversos mètodes més rellevants dins la caracterització de textures. Es veuran els mètodes més importants tant pel que fa als enfocaments estadístics com als estructurals. També en el mateix capítol se situa el nou mètode presentat en aquesta tesi dins els diferents enfocaments principals que existeixen. De la mateixa manera es fa una breu ressenya a la síntesi de textures, una manera d'avaluar quantitativament la caracterització de la textura d'una imatge. Ens centrarem principalment, en el capítol 3, en l'explicació del mètode presentat en aquest treball: s'introduiran els paràmetres de textura proposats, la seva necessitat i definicions. Al ser paràmetres altament perceptius i no seguir cap model matemàtic, en aquest mateix capítol s'utilitza una tècnica estadística anomenada anàlisi discriminant per demostrar que tots els paràmetres introdueixen suficient informació per a la separabilitat de regions de textura i veure que tots ells són necessaris en la discriminació de les textures.Dins el capítol 4 veurem com es tracta la informació subministrada pel sistema d'extracció de característiques per tal de classificar les dades i segmentar la imatge en funció de les seves textures. L'etapa de reconeixement de patrons es durà a terme en dues fases: aprenentatge i treball. També es presenta un estudi comparatiu entre diversos mètodes de classificació de textures i el mètode presentat en aquesta tesi; en ell es veu la bona funcionalitat del mètode en un temps de càlcul realment reduït. S'acaba el capítol amb una anàlisi de la robustesa del mètode introduint imatges amb diferents nivells de soroll aleatori. En el capítol 5 es presentaran els resultats obtinguts mitjançant l'extracció de característiques de textura a partir de diverses aplicacions reals. S'aplica el nostre mètode en aplicacions d'imatges aèries i en entorns agrícoles i sobre situacions que requereixen el processament en temps real com són la segmentació d'imatges de carreteres i una aplicació industrial d'inspecció i control de qualitat en l'estampació de teixits. Al final del capítol fem unes consideracions sobre dos efectes que poden influenciar en l'obtenció correcta dels resultats: zoom i canvis de perspectiva en les imatges de textura.En el capítol 6 es mostrarà l'arquitectura que s'ha dissenyat expressament per al càlcul dels paràmetres de textura en temps real. Dins el capítol es presentarà l'algorisme per a l'assignació de grups de textura i es demostrarà la seva velocitat d'operació a video rate.Finalment, en el capítol 7 es presentaran les conclusions i les línies de treball futures que es deriven d'aquesta tesi, així com els articles que hem publicat en relació a aquest treball i a l'anàlisi de textures. Les referències bibliogràfiques i els apèndixs conclouen el treball
    corecore