1,120 research outputs found

    Hybrid image representation methods for automatic image annotation: a survey

    Get PDF
    In most automatic image annotation systems, images are represented with low level features using either global methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is beneficial in annotating images. In this paper, we provide a survey on automatic image annotation techniques according to one aspect: feature extraction, and, in order to complement existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation

    Shape annotation for intelligent image retrieval

    Get PDF
    Annotation of shapes is an important process for semantic image retrieval. In this paper, we present a shape annotation framework that enables intelligent image retrieval by exploiting in a unified manner domain knowledge and perceptual description of shapes. A semi-supervised fuzzy clustering process is used to derive domain knowledge in terms of linguistic concepts referring to the semantic categories of shapes. For each category we derive a prototype that is a visual template for the category. A novel visual ontology is proposed to provide a description of prototypes and their salient parts. To describe parts of prototypes the visual ontology includes perceptual attributes that are defined by mimicking the analogy mechanism adopted by humans to describe the appearance of objects. The effectiveness of the developed framework as a facility for intelligent image retrieval is shown through results on a case study in the domain of fish shapes

    Who is the director of this movie? Automatic style recognition based on shot features

    Get PDF
    We show how low-level formal features, such as shot duration, meant as length of camera takes, and shot scale, i.e. the distance between the camera and the subject, are distinctive of a director's style in art movies. So far such features were thought of not having enough varieties to become distinctive of an author. However our investigation on the full filmographies of six different authors (Scorsese, Godard, Tarr, Fellini, Antonioni, and Bergman) for a total number of 120 movies analysed second by second, confirms that these shot-related features do not appear as random patterns in movies from the same director. For feature extraction we adopt methods based on both conventional and deep learning techniques. Our findings suggest that feature sequential patterns, i.e. how features evolve in time, are at least as important as the related feature distributions. To the best of our knowledge this is the first study dealing with automatic attribution of movie authorship, which opens up interesting lines of cross-disciplinary research on the impact of style on the aesthetic and emotional effects on the viewers

    Multifeature analysis and semantic context learning for image classification

    Get PDF
    This article introduces an image classification approach in which the semantic context of images and multiple low-level visual features are jointly exploited. The context consists of a set of semantic terms defining the classes to be associated to unclassified images. Initially, a multiobjective optimization technique is used to define a multifeature fusion model for each semantic class. Then, a Bayesian learning procedure is applied to derive a context model representing relationships among semantic classes. Finally, this ..

    Webly Supervised Learning of Convolutional Networks

    Full text link
    We present an approach to utilize large amounts of web data for learning CNNs. Specifically inspired by curriculum learning, we present a two-step approach for CNN training. First, we use easy images to train an initial visual representation. We then use this initial CNN and adapt it to harder, more realistic images by leveraging the structure of data and categories. We demonstrate that our two-stage CNN outperforms a fine-tuned CNN trained on ImageNet on Pascal VOC 2012. We also demonstrate the strength of webly supervised learning by localizing objects in web images and training a R-CNN style detector. It achieves the best performance on VOC 2007 where no VOC training data is used. Finally, we show our approach is quite robust to noise and performs comparably even when we use image search results from March 2013 (pre-CNN image search era)

    Algorithmic Analysis of Complex Audio Scenes

    Get PDF
    In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and sharing this data has been available. We describe a distributed infrastructure for searching, sharing and annotating animal sound collections collaboratively, which we have developed in this context. Although searching animal sound databases by metadata gives good results for many applications, annotating all occurences of a specific sound is beyond the scope of human annotators. Moreover, finding similar vocalisations to that of an example is not feasible by using only metadata. We therefore propose an algorithm for content-based similarity search in animal sound databases. Based on principles of image processing, we develop suitable features for the description of animal sounds. We enhance a concept for content-based multimedia retrieval by a ranking scheme which makes it an efficient tool for similarity search. One of the main sources of complexity in natural audio scenes, and the most difficult problem for pattern recognition, is the large number of sound sources which are active at the same time. We therefore examine methods for source separation based on microphone arrays. In particular, we propose an algorithm for the extraction of simpler components from complex audio scenes based on a sound complexity measure. Finally, we introduce pattern recognition algorithms for the vocalisations of a number of bird species. Some of these species are interesting for reasons of nature conservation, while one of the species serves as a prototype for song birds with strongly structured songs.Algorithmische Analyse Komplexer Audioszenen In dieser Arbeit untersuchen wir das Problem der Analyse komplexer Audioszenen mit besonderem Augenmerk auf natĂŒrliche Audioszenen. Eine der treibenden Zielsetzungen hinter dieser Arbeit ist es Werkzeuge zu entwickeln, die es erlauben ein auf LautĂ€ußerungen basierendes Monitoring von Tierarten in Zielregionen durchzufĂŒhren. Diese Aufgabenstellung, die hĂ€ufig in der Evaluation von Naturschutzmaßnahmen auftritt, fĂŒhrt zu einer Anzahl von Unterproblemen innerhalb der Audioszenen-Analyse. Eine wichtige Voraussetzung um Mustererkennungs-Algorithmen fĂŒr Tierstimmen entwickeln zu können, ist die VerfĂŒgbarkeit großer Sammlungen von Aufnahmen von Tierstimmen. Eine solche Sammlung aufzubauen liegt jenseits der Möglichkeiten eines einzelnen Forschers und wir verwenden daher Daten des Tierstimmenarchivs der Humboldt UniversitĂ€t Berlin. Obwohl eine große Anzahl gut annotierter Aufnahmen in diesem Archiv in digitaler Form vorlagen, gab es nur wenig unterstĂŒtzende Infrastruktur um diese Daten durchsuchen und verteilen zu können. Wir beschreiben eine verteilte Infrastruktur, mit deren Hilfe es möglich ist Tierstimmen-Sammlungen zu durchsuchen, sowie gemeinsam zu verwenden und zu annotieren, die wir in diesem Kontext entwickelt haben. Obwohl das Durchsuchen von Tierstimmen-Datenbank anhand von Metadaten fĂŒr viele Anwendungen gute Ergebnisse liefert, liegt es jenseits der Möglichkeiten menschlicher Annotatoren alle Vorkommen eines bestimmten GerĂ€uschs zu annotieren. DarĂŒber hinaus ist es nicht möglich einem Beispiel Ă€hnlich klingende GerĂ€usche nur anhand von Metadaten zu finden. Deshalb schlagen wir einen Algorithmus zur inhaltsbasierten Ähnlichkeitssuche in Tierstimmen-Datenbanken vor. Ausgehend von Methoden der Bildverarbeitung entwickeln wir geeignete Merkmale fĂŒr die Beschreibung von Tierstimmen. Wir erweitern ein Konzept zur inhaltsbasierten Multimedia-Suche um ein Ranking-Schema, dass dieses zu einem effizienten Werkzeug fĂŒr die Ähnlichkeitssuche macht. Eine der grundlegenden Quellen von KomplexitĂ€t in natĂŒrlichen Audioszenen, und das schwierigste Problem fĂŒr die Mustererkennung, stellt die hohe Anzahl gleichzeitig aktiver GerĂ€uschquellen dar. Deshalb untersuchen wir Methoden zur Quellentrennung, die auf Mikrofon-Arrays basieren. Insbesondere schlagen wir einen Algorithmus zur Extraktion einfacherer Komponenten aus komplexen Audioszenen vor, der auf einem Maß fĂŒr die KomplexitĂ€t von Audioaufnahmen beruht. Schließlich fĂŒhren wir Mustererkennungs-Algorithmen fĂŒr die LautĂ€ußerungen einer Reihe von Vogelarten ein. Einige dieser Arten sind aus GrĂŒnden des Naturschutzes interessant, wĂ€hrend eine Art als Prototyp fĂŒr Singvögel mit stark strukturierten GesĂ€ngen dient
    • 

    corecore