1,017 research outputs found

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    Visual and geographical data fusion to classify landmarks in geo-tagged images

    Get PDF
    High level semantic image recognition and classification is a challenging task and currently is a very active research domain. Computers struggle with the high level task of identifying objects and scenes within digital images accurately in unconstrained environments. In this paper, we present experiments that aim to overcome the limitations of computer vision algorithms by combining them with novel contextual based features to describe geo-tagged imagery. We adopt a machine learning based algorithm with the aim of classifying classes of geographical landmarks within digital images. We use community contributed image sets downloaded from Flickr and provide a thorough investigation, the results of which are presented in an evaluation section

    Content-Based Image Retrieval using Deep Learning

    Get PDF
    A content-based image retrieval (CBIR) system works on the low-level visual features of a user input query image, which makes it difficult for the users to formulate the query and also does not give satisfactory retrieval results. In the past image annotation was proposed as the best possible system for CBIR which works on the principle of automatically assigning keywords to images that help image retrieval users to query images based on these keywords. Image annotation is often regarded as the problem of image classification where the images are represented by some low-level features and the mapping between low-level features and high-level concepts (class labels) is done by some supervised learning algorithms. In a CBIR system learning of effective feature representations and similarity measures is very important for the retrieval performance. Semantic gap has been the key challenge in the past for this problem. A semantic gap exists between low-level image pixels captured by machines and the high-level semantics perceived by humans. Machine learning has been exploited to bridge this gap in the long term. The recent successes of deep learning techniques especially Convolutional Neural Networks (CNN) in solving computer vision applications has inspired me to work on this thesis so as to solve the problem of CBIR using a dataset of annotated images

    Automatic Concept Discovery from Parallel Text and Visual Corpora

    Full text link
    Humans connect language and vision to perceive the world. How to build a similar connection for computers? One possible way is via visual concepts, which are text terms that relate to visually discriminative entities. We propose an automatic visual concept discovery algorithm using parallel text and visual corpora; it filters text terms based on the visual discriminative power of the associated images, and groups them into concepts using visual and semantic similarities. We illustrate the applications of the discovered concepts using bidirectional image and sentence retrieval task and image tagging task, and show that the discovered concepts not only outperform several large sets of manually selected concepts significantly, but also achieves the state-of-the-art performance in the retrieval task.Comment: To appear in ICCV 201

    Suchbasierte automatische Bildannotation anhand geokodierter Community-Fotos

    Get PDF
    In the Web 2.0 era, platforms for sharing and collaboratively annotating images with keywords, called tags, became very popular. Tags are a powerful means for organizing and retrieving photos. However, manual tagging is time consuming. Recently, the sheer amount of user-tagged photos available on the Web encouraged researchers to explore new techniques for automatic image annotation. The idea is to annotate an unlabeled image by propagating the labels of community photos that are visually similar to it. Most recently, an ever increasing amount of community photos is also associated with location information, i.e., geotagged. In this thesis, we aim at exploiting the location context and propose an approach for automatically annotating geotagged photos. Our objective is to address the main limitations of state-of-the-art approaches in terms of the quality of the produced tags and the speed of the complete annotation process. To achieve these goals, we, first, deal with the problem of collecting images with the associated metadata from online repositories. Accordingly, we introduce a strategy for data crawling that takes advantage of location information and the social relationships among the contributors of the photos. To improve the quality of the collected user-tags, we present a method for resolving their ambiguity based on tag relatedness information. In this respect, we propose an approach for representing tags as probability distributions based on the algorithm of Laplacian score feature selection. Furthermore, we propose a new metric for calculating the distance between tag probability distributions by extending Jensen-Shannon Divergence to account for statistical fluctuations. To efficiently identify the visual neighbors, the thesis introduces two extensions to the state-of-the-art image matching algorithm, known as Speeded Up Robust Features (SURF). To speed up the matching, we present a solution for reducing the number of compared SURF descriptors based on classification techniques, while the accuracy of SURF is improved through an efficient method for iterative image matching. Furthermore, we propose a statistical model for ranking the mined annotations according to their relevance to the target image. This is achieved by combining multi-modal information in a statistical framework based on Bayes' rule. Finally, the effectiveness of each of mentioned contributions as well as the complete automatic annotation process are evaluated experimentally.Seit der EinfĂŒhrung von Web 2.0 steigt die PopularitĂ€t von Plattformen, auf denen Bilder geteilt und durch die Gemeinschaft mit Schlagwörtern, sogenannten Tags, annotiert werden. Mit Tags lassen sich Fotos leichter organisieren und auffinden. Manuelles Taggen ist allerdings sehr zeitintensiv. Animiert von der schieren Menge an im Web zugĂ€nglichen, von Usern getaggten Fotos, erforschen Wissenschaftler derzeit neue Techniken der automatischen Bildannotation. Dahinter steht die Idee, ein noch nicht beschriftetes Bild auf der Grundlage visuell Ă€hnlicher, bereits beschrifteter Community-Fotos zu annotieren. UnlĂ€ngst wurde eine immer grĂ¶ĂŸere Menge an Community-Fotos mit geographischen Koordinaten versehen (geottagged). Die Arbeit macht sich diesen geographischen Kontext zunutze und prĂ€sentiert einen Ansatz zur automatischen Annotation geogetaggter Fotos. Ziel ist es, die wesentlichen Grenzen der bisher bekannten AnsĂ€tze in Hinsicht auf die QualitĂ€t der produzierten Tags und die Geschwindigkeit des gesamten Annotationsprozesses aufzuzeigen. Um dieses Ziel zu erreichen, wurden zunĂ€chst Bilder mit entsprechenden Metadaten aus den Online-Quellen gesammelt. Darauf basierend, wird eine Strategie zur Datensammlung eingefĂŒhrt, die sich sowohl der geographischen Informationen als auch der sozialen Verbindungen zwischen denjenigen, die die Fotos zur VerfĂŒgung stellen, bedient. Um die QualitĂ€t der gesammelten User-Tags zu verbessern, wird eine Methode zur Auflösung ihrer AmbiguitĂ€t vorgestellt, die auf der Information der Tag-Ähnlichkeiten basiert. In diesem Zusammenhang wird ein Ansatz zur Darstellung von Tags als Wahrscheinlichkeitsverteilungen vorgeschlagen, der auf den Algorithmus der sogenannten Laplacian Score (LS) aufbaut. Des Weiteren wird eine Erweiterung der Jensen-Shannon-Divergence (JSD) vorgestellt, die statistische Fluktuationen berĂŒcksichtigt. Zur effizienten Identifikation der visuellen Nachbarn werden in der Arbeit zwei Erweiterungen des Speeded Up Robust Features (SURF)-Algorithmus vorgestellt. Zur Beschleunigung des Abgleichs wird eine Lösung auf der Basis von Klassifikationstechniken prĂ€sentiert, die die Anzahl der miteinander verglichenen SURF-Deskriptoren minimiert, wĂ€hrend die SURF-Genauigkeit durch eine effiziente Methode des schrittweisen Bildabgleichs verbessert wird. Des Weiteren wird ein statistisches Modell basierend auf der Baye'schen Regel vorgeschlagen, um die erlangten Annotationen entsprechend ihrer Relevanz in Bezug auf das Zielbild zu ranken. Schließlich wird die Effizienz jedes einzelnen, erwĂ€hnten Beitrags experimentell evaluiert. DarĂŒber hinaus wird die Performanz des vorgeschlagenen automatischen Annotationsansatzes durch umfassende experimentelle Studien als Ganzes demonstriert
    • 

    corecore