8 research outputs found

    Apprentissage de distance pour l'annotation d'images par plus proches voisins

    Get PDF
    National audienceL'annotation automatique d'image est un probleme ouvert important pour la vision par ordinateur. Pour cette tache nous proposons TagProp, un modele par plus proche voisins ponderes. Celui-ci est entraine de maniere discriminative et exploite des images d'apprentissage pour predire les labels des images de test. Les poids sont calcules a partir du rang ou de la distance entre l'image et son voisin. TagProp permet l'optimisation de la distance qui definit les voisinages en maximisant la log-vraisemblance des predictions de l'ensemble d'apprentissage. Ainsi, nous pouvons regler de maniere optimale la combinaison de plusieurs similarites visuelles qui vont des histogrammes globaux de couleur aux descriptions locales de forme. Nous proposons egalement de moduler specifiquement chaque mot pour augmenter le rappel des mots rares. Nous comparons les performances des differentes variantes de notre modele a l'etat de l'art sur trois bases d'images. Sur les cinq mesures considerees, TagProp ameliore significativement l'etat de l'art

    Image Annotation with TagProp on the MIRFLICKR set

    Get PDF
    International audienceImage annotation is an important computer vision problem where the goal is to determine the relevance of annotation terms for images. Image annotation has two main applications: (i) proposing a list of relevant terms to users that want to assign indexing terms to images, and (ii) supporting keyword based search for images without indexing terms, using the relevance estimates to rank images. In this paper we present TagProp, a weighted nearest neighbour model that predicts the term relevance of images by taking a weighted sum of the annotations of the visually most similar images in an annotated training set. TagProp can use a collection of distance measures capturing different aspects of image content, such as local shape descriptors, and global colour histograms. It automatically finds the optimal combination of distances to define the visual neighbours of images that are most useful for annotation prediction. TagProp compensates for the varying frequencies of annotation terms using a term-specific sigmoid to scale the weighted nearest neighbour tag predictions. We evaluate different variants of TagProp with experiments on the MIR Flickr set, and compare with an approach that learns a separate SVM classifier for each annotation term. We also consider using Flickr tags to train our models, both as additional features and as training labels. We find the SVMs to work better when learning from the manual annotations, but TagProp to work better when learning from the Flickr tags. We also find that using the Flickr tags as a feature can significantly improve the performance of SVMs learned from manual annotations

    Learning Transferable Representations for Visual Recognition

    Get PDF
    In the last half-decade, a new renaissance of machine learning originates from the applications of convolutional neural networks to visual recognition tasks. It is believed that a combination of big curated data and novel deep learning techniques can lead to unprecedented results. However, the increasingly large training data is still a drop in the ocean compared with scenarios in the wild. In this literature, we focus on learning transferable representation in the neural networks to ensure the models stay robust, even given different data distributions. We present three exemplar topics in three chapters, respectively: zero-shot learning, domain adaptation, and generalizable adversarial attack. By zero-shot learning, we enable models to predict labels not seen in the training phase. By domain adaptation, we improve a model\u27s performance on the target domain by mitigating its discrepancy from a labeled source model, without any target annotation. Finally, the generalization adversarial attack focuses on learning an adversarial camouflage that ideally would work in every possible scenario. Despite sharing the same transfer learning philosophy, each of the proposed topics poses a unique challenge requiring a unique solution. In each chapter, we introduce the problem as well as present our solution to the problem. We also discuss some other researchers\u27 approaches and compare our solution to theirs in the experiments

    Agente de visão semântica para robótica

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaVisão semântica é uma importante linha de investigação na área de visão por computador. A palavra-chave “semântica” implica a extracção de características não apenas visuais (cor, forma, textura), mas também qualquer tipo de informação de “alto-nível”. Em particular, a visão semântica procura compreender ou interpretar imagens de cenas em termos dos objectos presentes e eventualmente das relações entre eles. Uma das principais áreas de aplicação actual é a robótica. Sendo o mundo que nos rodeia extremamente visual, a interacção entre um utilizador humano não especializado e um robô requer que o robô seja capaz de detectar, reconhecer e compreender qualquer tipo de referências visuais fornecidas no âmbito da comunicação entre o utilizador e o robô. Para que tal seja possível, é necessária uma fase de aprendizagem, através da qual várias categorias de objectos são aprendidas pelo robô. Depois deste processo, o robô será capaz de reconhecer novas instâncias das categorias anteriormente aprendidas. Foi desenvolvido um novo agente de visão semântica que recorre a serviços de pesquisa de imagens na Web para aprender um conjunto de categorias gerais a partir apenas dos seus respectivos nomes. O trabalho teve como ponto de partida o agente UA@SRVC, anteriormente desenvolvido na Universidade de Aveiro para participação no Semantic Robot Vision Challenge. O trabalho começou pelo desenvolvimento de uma nova técnica de segmentação de objectos baseada nas suas arestas e na diversidade de cor. De seguida, a técnica de pesquisa semântica e selecção de imagens de treino do agente UA@SRVC foi revista e reimplementada utilizando, entre outros componentes, o novo módulo de segmentação. Por fim foram desenvolvidos novos classificadores para o reconhecimento de objectos. Apreendemos que, mesmo com pouca informação prévia sobre um objecto, é possível segmentá-lo correctamente utilizando para isso uma heurística simples que combina a diversidade da cor e a distância entre segmentos. Recorrendo a uma técnica de agrupamento conceptual, é possível criar um sistema de votos que permite efectuar uma boa selecção de instâncias para o treino de categorias. Conclui-se também que diferentes classificadores são mais eficientes quando a fase de aprendizagem é supervisionada ou automatizada.Semantic vision is an important line of research in computer vision. The keyword “semantic” means the extraction of features, not only visual (color, shape, texture), but also any “higher level” information. In particular, semantic vision seeks to understand or interpret images of scenes in terms of present objects and possible relations between them. One of the main areas of current application is robotics. As the world around us is extremely visual, interaction between a non specialized human user and a robot requires the robot to be able to detect, recognize and understand any kind of visual cues provided in the communication between user and robot. To make this possible, a learning phase is needed, in which various categories of objects are learned by the robot. After this process, the robot will be able to recognize new instances of the categories previously learned. We developed a new semantic vision agent that uses image search web services to learn a set of general categories based only on their respective names. The work had as starting point the agent UA@SRVC, previously developed at the University of Aveiro for participation in the Semantic Robot Vision Challenge. This work began by developing a new technique for segmentation of objects based on their edges and diversity of color. Then, the technique of semantic search and selection of images from the agent UA@SRVC was revised and reimplemented using, among other components, the new object extracting module. Finally new classifiers were developed for the recognition of objects. We learned that, even with little prior information about an object, it is possible to segment it correctly using a simple heuristic that combines colour disparity and distance between segments. Drawing on a conceptual clustering technique, we can create a voting system that allows a good selection of instances for training the categories. We also conclude that various classifiers are most effective when the learning phase is supervised or automated

    Evaluation Methodologies for Visual Information Retrieval and Annotation

    Get PDF
    Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt Performanz und Qualität der Informationsgewinnung zu bewerten. Bereits in den 60er Jahren wurden erste Methodologien für die system-basierte Evaluation aufgestellt und in den Cranfield Experimenten überprüft. Heutzutage gehören Evaluation, Test und Qualitätsbewertung zu einem aktiven Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten Methoden. Evaluationsmethoden fanden zunächst in der Bewertung von Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von Multimediaanalyse-Systeme übertragen. Dies geschah häufig, ohne die Evaluationsmethoden in Frage zu stellen oder sie an die veränderten Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschäftigt sich mit der system-basierten Evaluation von Indizierungssystemen für Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von Annotationen: Nutzeranforderungen für das Suchen und Verschlagworten von Bildern, Evaluationsmaße für die Qualitätsbewertung von Indizierungssystemen und Anforderungen an die Erstellung visueller Testkollektionen. Am Beispiel der Evaluation automatisierter Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer zuverlässigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt und Evaluationsmaße zur Qualitätsbewertung eingeführt, analysiert und experimentell verglichen. Traditionelle Maße zur Ermittlung der Performanz werden in vier Dimensionen klassifiziert. Evaluationsmaße vergeben üblicherweise binäre Kosten für korrekte und falsche Annotationen. Diese Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin überprüft werden. In dieser Arbeit wird aufgezeigt, wie semantische Ähnlichkeiten visueller Konzepte automatisiert abgeschätzt und in den Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit inkludieren ein Nutzermodell für die konzeptbasierte Suche von Bildern, eine vollständig bewertete Testkollektion und neue Evaluationsmaße für die anforderungsgerechte Qualitätsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information Retrieval (IR) systems. Starting with the Cranfield experiments in the early 60ies, methodologies for the system-based performance assessment emerged and established themselves, resulting in an active research field with a number of successful benchmarking activities. With the rise of the digital age, procedures of text retrieval evaluation were often transferred to multimedia retrieval evaluation without questioning their direct applicability. This thesis investigates the problem of system-based performance assessment of annotation approaches in generic image collections. It addresses three important parts of annotation evaluation, namely user requirements for the retrieval of annotated visual media, performance measures for multi-label evaluation, and visual test collections. Using the example of multi-label image annotation evaluation, I discuss which concepts to employ for indexing, how to obtain a reliable ground truth to moderate costs, and which evaluation measures are appropriate. This is accompanied by a thorough analysis of related work on system-based performance assessment in Visual Information Retrieval (VIR). Traditional performance measures are classified into four dimensions and investigated according to their appropriateness for visual annotation evaluation. One of the main ideas in this thesis adheres to the common assumption on the binary nature of the score prediction dimension in annotation evaluation. However, the predicted concepts and the set of true indexed concepts interrelate with each other. This work will show how to utilise these semantic relationships for a fine-grained evaluation scenario. Outcomes of this thesis result in a user model for concept-based image retrieval, a fully assessed image annotation test collection, and a number of novel performance measures for image annotation evaluation

    Suchbasierte automatische Bildannotation anhand geokodierter Community-Fotos

    Get PDF
    In the Web 2.0 era, platforms for sharing and collaboratively annotating images with keywords, called tags, became very popular. Tags are a powerful means for organizing and retrieving photos. However, manual tagging is time consuming. Recently, the sheer amount of user-tagged photos available on the Web encouraged researchers to explore new techniques for automatic image annotation. The idea is to annotate an unlabeled image by propagating the labels of community photos that are visually similar to it. Most recently, an ever increasing amount of community photos is also associated with location information, i.e., geotagged. In this thesis, we aim at exploiting the location context and propose an approach for automatically annotating geotagged photos. Our objective is to address the main limitations of state-of-the-art approaches in terms of the quality of the produced tags and the speed of the complete annotation process. To achieve these goals, we, first, deal with the problem of collecting images with the associated metadata from online repositories. Accordingly, we introduce a strategy for data crawling that takes advantage of location information and the social relationships among the contributors of the photos. To improve the quality of the collected user-tags, we present a method for resolving their ambiguity based on tag relatedness information. In this respect, we propose an approach for representing tags as probability distributions based on the algorithm of Laplacian score feature selection. Furthermore, we propose a new metric for calculating the distance between tag probability distributions by extending Jensen-Shannon Divergence to account for statistical fluctuations. To efficiently identify the visual neighbors, the thesis introduces two extensions to the state-of-the-art image matching algorithm, known as Speeded Up Robust Features (SURF). To speed up the matching, we present a solution for reducing the number of compared SURF descriptors based on classification techniques, while the accuracy of SURF is improved through an efficient method for iterative image matching. Furthermore, we propose a statistical model for ranking the mined annotations according to their relevance to the target image. This is achieved by combining multi-modal information in a statistical framework based on Bayes' rule. Finally, the effectiveness of each of mentioned contributions as well as the complete automatic annotation process are evaluated experimentally.Seit der Einführung von Web 2.0 steigt die Popularität von Plattformen, auf denen Bilder geteilt und durch die Gemeinschaft mit Schlagwörtern, sogenannten Tags, annotiert werden. Mit Tags lassen sich Fotos leichter organisieren und auffinden. Manuelles Taggen ist allerdings sehr zeitintensiv. Animiert von der schieren Menge an im Web zugänglichen, von Usern getaggten Fotos, erforschen Wissenschaftler derzeit neue Techniken der automatischen Bildannotation. Dahinter steht die Idee, ein noch nicht beschriftetes Bild auf der Grundlage visuell ähnlicher, bereits beschrifteter Community-Fotos zu annotieren. Unlängst wurde eine immer größere Menge an Community-Fotos mit geographischen Koordinaten versehen (geottagged). Die Arbeit macht sich diesen geographischen Kontext zunutze und präsentiert einen Ansatz zur automatischen Annotation geogetaggter Fotos. Ziel ist es, die wesentlichen Grenzen der bisher bekannten Ansätze in Hinsicht auf die Qualität der produzierten Tags und die Geschwindigkeit des gesamten Annotationsprozesses aufzuzeigen. Um dieses Ziel zu erreichen, wurden zunächst Bilder mit entsprechenden Metadaten aus den Online-Quellen gesammelt. Darauf basierend, wird eine Strategie zur Datensammlung eingeführt, die sich sowohl der geographischen Informationen als auch der sozialen Verbindungen zwischen denjenigen, die die Fotos zur Verfügung stellen, bedient. Um die Qualität der gesammelten User-Tags zu verbessern, wird eine Methode zur Auflösung ihrer Ambiguität vorgestellt, die auf der Information der Tag-Ähnlichkeiten basiert. In diesem Zusammenhang wird ein Ansatz zur Darstellung von Tags als Wahrscheinlichkeitsverteilungen vorgeschlagen, der auf den Algorithmus der sogenannten Laplacian Score (LS) aufbaut. Des Weiteren wird eine Erweiterung der Jensen-Shannon-Divergence (JSD) vorgestellt, die statistische Fluktuationen berücksichtigt. Zur effizienten Identifikation der visuellen Nachbarn werden in der Arbeit zwei Erweiterungen des Speeded Up Robust Features (SURF)-Algorithmus vorgestellt. Zur Beschleunigung des Abgleichs wird eine Lösung auf der Basis von Klassifikationstechniken präsentiert, die die Anzahl der miteinander verglichenen SURF-Deskriptoren minimiert, während die SURF-Genauigkeit durch eine effiziente Methode des schrittweisen Bildabgleichs verbessert wird. Des Weiteren wird ein statistisches Modell basierend auf der Baye'schen Regel vorgeschlagen, um die erlangten Annotationen entsprechend ihrer Relevanz in Bezug auf das Zielbild zu ranken. Schließlich wird die Effizienz jedes einzelnen, erwähnten Beitrags experimentell evaluiert. Darüber hinaus wird die Performanz des vorgeschlagenen automatischen Annotationsansatzes durch umfassende experimentelle Studien als Ganzes demonstriert

    Coherent image annotation by learning semantic distance

    No full text
    Conventional approaches to automatic image annotation usually suffer from two problems: (1) They cannot guarantee a good semantic coherence of the annotated words for each image, as they treat each word independently without considering the inherent semantic coherence among the words; (2) They heavily rely on visual similarity for judging semantic similarity. To address the above issues, we propose a novel approach to image annotation which simultaneously learns a semantic distance by capturing the prior annotation knowledge and propagates the annotation of an image as a whole entity. Specifically, a semantic distance function (SDF) is learned for each semantic cluster to measure the semantic similarity based on relative comparison relations of prior annotations. To annotate a new image, the training images in each cluster are ranked according to their SDF values with respect to this image and their corresponding annotations are then propagated to this image as a whole entity to ensure semantic coherence. We evaluate the innovative SDF-based approach on Corel images compared with Support Vector Machine-based approach. The experiments show that SDF-based approach outperforms in terms of semantic coherence, especially when each training image is associated with multiple words. 1
    corecore