1,120 research outputs found
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
Shape annotation for intelligent image retrieval
Annotation of shapes is an important process for semantic image retrieval. In this paper, we present a shape annotation framework that enables intelligent image retrieval by exploiting in a unified manner domain knowledge and perceptual description of shapes. A semi-supervised fuzzy clustering process is used to derive domain knowledge in terms of linguistic concepts referring to the semantic categories of shapes. For each category we derive a prototype that is a visual template for the category. A novel visual ontology is proposed to provide a description of prototypes and their salient parts. To describe parts of prototypes the visual ontology includes perceptual attributes that are defined by mimicking the analogy mechanism adopted by humans to describe the appearance of objects. The effectiveness of the developed framework as a facility for intelligent image retrieval is shown through results on a case study in the domain of fish shapes
Who is the director of this movie? Automatic style recognition based on shot features
We show how low-level formal features, such as shot duration, meant as length
of camera takes, and shot scale, i.e. the distance between the camera and the
subject, are distinctive of a director's style in art movies. So far such
features were thought of not having enough varieties to become distinctive of
an author. However our investigation on the full filmographies of six different
authors (Scorsese, Godard, Tarr, Fellini, Antonioni, and Bergman) for a total
number of 120 movies analysed second by second, confirms that these
shot-related features do not appear as random patterns in movies from the same
director. For feature extraction we adopt methods based on both conventional
and deep learning techniques. Our findings suggest that feature sequential
patterns, i.e. how features evolve in time, are at least as important as the
related feature distributions. To the best of our knowledge this is the first
study dealing with automatic attribution of movie authorship, which opens up
interesting lines of cross-disciplinary research on the impact of style on the
aesthetic and emotional effects on the viewers
Multifeature analysis and semantic context learning for image classification
This article introduces an image classification approach in which the semantic context of images and multiple low-level visual features are jointly exploited. The context consists of a set of semantic terms defining the classes to be associated to unclassified images. Initially, a multiobjective optimization technique is used to define a multifeature fusion model for each semantic class. Then, a Bayesian learning procedure is applied to derive a context model representing relationships among semantic classes. Finally, this ..
Webly Supervised Learning of Convolutional Networks
We present an approach to utilize large amounts of web data for learning
CNNs. Specifically inspired by curriculum learning, we present a two-step
approach for CNN training. First, we use easy images to train an initial visual
representation. We then use this initial CNN and adapt it to harder, more
realistic images by leveraging the structure of data and categories. We
demonstrate that our two-stage CNN outperforms a fine-tuned CNN trained on
ImageNet on Pascal VOC 2012. We also demonstrate the strength of webly
supervised learning by localizing objects in web images and training a R-CNN
style detector. It achieves the best performance on VOC 2007 where no VOC
training data is used. Finally, we show our approach is quite robust to noise
and performs comparably even when we use image search results from March 2013
(pre-CNN image search era)
Algorithmic Analysis of Complex Audio Scenes
In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and sharing this data has been available. We describe a distributed infrastructure for searching, sharing and annotating animal sound collections collaboratively, which we have developed in this context. Although searching animal sound databases by metadata gives good results for many applications, annotating all occurences of a specific sound is beyond the scope of human annotators. Moreover, finding similar vocalisations to that of an example is not feasible by using only metadata. We therefore propose an algorithm for content-based similarity search in animal sound databases. Based on principles of image processing, we develop suitable features for the description of animal sounds. We enhance a concept for content-based multimedia retrieval by a ranking scheme which makes it an efficient tool for similarity search. One of the main sources of complexity in natural audio scenes, and the most difficult problem for pattern recognition, is the large number of sound sources which are active at the same time. We therefore examine methods for source separation based on microphone arrays. In particular, we propose an algorithm for the extraction of simpler components from complex audio scenes based on a sound complexity measure. Finally, we introduce pattern recognition algorithms for the vocalisations of a number of bird species. Some of these species are interesting for reasons of nature conservation, while one of the species serves as a prototype for song birds with strongly structured songs.Algorithmische Analyse Komplexer Audioszenen In dieser Arbeit untersuchen wir das Problem der Analyse komplexer Audioszenen mit besonderem Augenmerk auf natĂŒrliche Audioszenen. Eine der treibenden Zielsetzungen hinter dieser Arbeit ist es Werkzeuge zu entwickeln, die es erlauben ein auf LautĂ€uĂerungen basierendes Monitoring von Tierarten in Zielregionen durchzufĂŒhren. Diese Aufgabenstellung, die hĂ€ufig in der Evaluation von NaturschutzmaĂnahmen auftritt, fĂŒhrt zu einer Anzahl von Unterproblemen innerhalb der Audioszenen-Analyse. Eine wichtige Voraussetzung um Mustererkennungs-Algorithmen fĂŒr Tierstimmen entwickeln zu können, ist die VerfĂŒgbarkeit groĂer Sammlungen von Aufnahmen von Tierstimmen. Eine solche Sammlung aufzubauen liegt jenseits der Möglichkeiten eines einzelnen Forschers und wir verwenden daher Daten des Tierstimmenarchivs der Humboldt UniversitĂ€t Berlin. Obwohl eine groĂe Anzahl gut annotierter Aufnahmen in diesem Archiv in digitaler Form vorlagen, gab es nur wenig unterstĂŒtzende Infrastruktur um diese Daten durchsuchen und verteilen zu können. Wir beschreiben eine verteilte Infrastruktur, mit deren Hilfe es möglich ist Tierstimmen-Sammlungen zu durchsuchen, sowie gemeinsam zu verwenden und zu annotieren, die wir in diesem Kontext entwickelt haben. Obwohl das Durchsuchen von Tierstimmen-Datenbank anhand von Metadaten fĂŒr viele Anwendungen gute Ergebnisse liefert, liegt es jenseits der Möglichkeiten menschlicher Annotatoren alle Vorkommen eines bestimmten GerĂ€uschs zu annotieren. DarĂŒber hinaus ist es nicht möglich einem Beispiel Ă€hnlich klingende GerĂ€usche nur anhand von Metadaten zu finden. Deshalb schlagen wir einen Algorithmus zur inhaltsbasierten Ăhnlichkeitssuche in Tierstimmen-Datenbanken vor. Ausgehend von Methoden der Bildverarbeitung entwickeln wir geeignete Merkmale fĂŒr die Beschreibung von Tierstimmen. Wir erweitern ein Konzept zur inhaltsbasierten Multimedia-Suche um ein Ranking-Schema, dass dieses zu einem effizienten Werkzeug fĂŒr die Ăhnlichkeitssuche macht. Eine der grundlegenden Quellen von KomplexitĂ€t in natĂŒrlichen Audioszenen, und das schwierigste Problem fĂŒr die Mustererkennung, stellt die hohe Anzahl gleichzeitig aktiver GerĂ€uschquellen dar. Deshalb untersuchen wir Methoden zur Quellentrennung, die auf Mikrofon-Arrays basieren. Insbesondere schlagen wir einen Algorithmus zur Extraktion einfacherer Komponenten aus komplexen Audioszenen vor, der auf einem MaĂ fĂŒr die KomplexitĂ€t von Audioaufnahmen beruht. SchlieĂlich fĂŒhren wir Mustererkennungs-Algorithmen fĂŒr die LautĂ€uĂerungen einer Reihe von Vogelarten ein. Einige dieser Arten sind aus GrĂŒnden des Naturschutzes interessant, wĂ€hrend eine Art als Prototyp fĂŒr Singvögel mit stark strukturierten GesĂ€ngen dient
- âŠ