274 research outputs found

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    Mining social media to create personalized recommendations for tourist visits

    Get PDF
    International audiencePhoto sharing platforms users often annotate their trip photos with landmark names. These annotations can be aggregated in order to recommend lists of popular visitor attractions similar to those found in classical tourist guides. However, individual tourist preferences can vary significantly so good recommendations should be tailored to individual tastes. Here we pose this visit personalization as a collaborative filtering problem. We mine the record of visited landmarks exposed in online user data to build a user-user similarity matrix. When a user wants to visit a new destination, a list of potentially interesting visitor attractions is produced based on the experience of like-minded users who already visited that destination. We compare our recommender to a baseline which simulates classical tourist guides on a large sample of Flickr users

    Evaluation Methodologies for Visual Information Retrieval and Annotation

    Get PDF
    Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt Performanz und QualitĂ€t der Informationsgewinnung zu bewerten. Bereits in den 60er Jahren wurden erste Methodologien fĂŒr die system-basierte Evaluation aufgestellt und in den Cranfield Experimenten ĂŒberprĂŒft. Heutzutage gehören Evaluation, Test und QualitĂ€tsbewertung zu einem aktiven Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten Methoden. Evaluationsmethoden fanden zunĂ€chst in der Bewertung von Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von Multimediaanalyse-Systeme ĂŒbertragen. Dies geschah hĂ€ufig, ohne die Evaluationsmethoden in Frage zu stellen oder sie an die verĂ€nderten Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschĂ€ftigt sich mit der system-basierten Evaluation von Indizierungssystemen fĂŒr Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von Annotationen: Nutzeranforderungen fĂŒr das Suchen und Verschlagworten von Bildern, Evaluationsmaße fĂŒr die QualitĂ€tsbewertung von Indizierungssystemen und Anforderungen an die Erstellung visueller Testkollektionen. Am Beispiel der Evaluation automatisierter Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer zuverlĂ€ssigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt und Evaluationsmaße zur QualitĂ€tsbewertung eingefĂŒhrt, analysiert und experimentell verglichen. Traditionelle Maße zur Ermittlung der Performanz werden in vier Dimensionen klassifiziert. Evaluationsmaße vergeben ĂŒblicherweise binĂ€re Kosten fĂŒr korrekte und falsche Annotationen. Diese Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin ĂŒberprĂŒft werden. In dieser Arbeit wird aufgezeigt, wie semantische Ähnlichkeiten visueller Konzepte automatisiert abgeschĂ€tzt und in den Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit inkludieren ein Nutzermodell fĂŒr die konzeptbasierte Suche von Bildern, eine vollstĂ€ndig bewertete Testkollektion und neue Evaluationsmaße fĂŒr die anforderungsgerechte QualitĂ€tsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information Retrieval (IR) systems. Starting with the Cranfield experiments in the early 60ies, methodologies for the system-based performance assessment emerged and established themselves, resulting in an active research field with a number of successful benchmarking activities. With the rise of the digital age, procedures of text retrieval evaluation were often transferred to multimedia retrieval evaluation without questioning their direct applicability. This thesis investigates the problem of system-based performance assessment of annotation approaches in generic image collections. It addresses three important parts of annotation evaluation, namely user requirements for the retrieval of annotated visual media, performance measures for multi-label evaluation, and visual test collections. Using the example of multi-label image annotation evaluation, I discuss which concepts to employ for indexing, how to obtain a reliable ground truth to moderate costs, and which evaluation measures are appropriate. This is accompanied by a thorough analysis of related work on system-based performance assessment in Visual Information Retrieval (VIR). Traditional performance measures are classified into four dimensions and investigated according to their appropriateness for visual annotation evaluation. One of the main ideas in this thesis adheres to the common assumption on the binary nature of the score prediction dimension in annotation evaluation. However, the predicted concepts and the set of true indexed concepts interrelate with each other. This work will show how to utilise these semantic relationships for a fine-grained evaluation scenario. Outcomes of this thesis result in a user model for concept-based image retrieval, a fully assessed image annotation test collection, and a number of novel performance measures for image annotation evaluation

    Evaluation Methodologies for Visual Information Retrieval and Annotation

    Get PDF
    Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt Performanz und QualitĂ€t der Informationsgewinnung zu bewerten. Bereits in den 60er Jahren wurden erste Methodologien fĂŒr die system-basierte Evaluation aufgestellt und in den Cranfield Experimenten ĂŒberprĂŒft. Heutzutage gehören Evaluation, Test und QualitĂ€tsbewertung zu einem aktiven Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten Methoden. Evaluationsmethoden fanden zunĂ€chst in der Bewertung von Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von Multimediaanalyse-Systeme ĂŒbertragen. Dies geschah hĂ€ufig, ohne die Evaluationsmethoden in Frage zu stellen oder sie an die verĂ€nderten Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschĂ€ftigt sich mit der system-basierten Evaluation von Indizierungssystemen fĂŒr Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von Annotationen: Nutzeranforderungen fĂŒr das Suchen und Verschlagworten von Bildern, Evaluationsmaße fĂŒr die QualitĂ€tsbewertung von Indizierungssystemen und Anforderungen an die Erstellung visueller Testkollektionen. Am Beispiel der Evaluation automatisierter Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer zuverlĂ€ssigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt und Evaluationsmaße zur QualitĂ€tsbewertung eingefĂŒhrt, analysiert und experimentell verglichen. Traditionelle Maße zur Ermittlung der Performanz werden in vier Dimensionen klassifiziert. Evaluationsmaße vergeben ĂŒblicherweise binĂ€re Kosten fĂŒr korrekte und falsche Annotationen. Diese Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin ĂŒberprĂŒft werden. In dieser Arbeit wird aufgezeigt, wie semantische Ähnlichkeiten visueller Konzepte automatisiert abgeschĂ€tzt und in den Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit inkludieren ein Nutzermodell fĂŒr die konzeptbasierte Suche von Bildern, eine vollstĂ€ndig bewertete Testkollektion und neue Evaluationsmaße fĂŒr die anforderungsgerechte QualitĂ€tsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information Retrieval (IR) systems. Starting with the Cranfield experiments in the early 60ies, methodologies for the system-based performance assessment emerged and established themselves, resulting in an active research field with a number of successful benchmarking activities. With the rise of the digital age, procedures of text retrieval evaluation were often transferred to multimedia retrieval evaluation without questioning their direct applicability. This thesis investigates the problem of system-based performance assessment of annotation approaches in generic image collections. It addresses three important parts of annotation evaluation, namely user requirements for the retrieval of annotated visual media, performance measures for multi-label evaluation, and visual test collections. Using the example of multi-label image annotation evaluation, I discuss which concepts to employ for indexing, how to obtain a reliable ground truth to moderate costs, and which evaluation measures are appropriate. This is accompanied by a thorough analysis of related work on system-based performance assessment in Visual Information Retrieval (VIR). Traditional performance measures are classified into four dimensions and investigated according to their appropriateness for visual annotation evaluation. One of the main ideas in this thesis adheres to the common assumption on the binary nature of the score prediction dimension in annotation evaluation. However, the predicted concepts and the set of true indexed concepts interrelate with each other. This work will show how to utilise these semantic relationships for a fine-grained evaluation scenario. Outcomes of this thesis result in a user model for concept-based image retrieval, a fully assessed image annotation test collection, and a number of novel performance measures for image annotation evaluation

    Suchbasierte automatische Bildannotation anhand geokodierter Community-Fotos

    Get PDF
    In the Web 2.0 era, platforms for sharing and collaboratively annotating images with keywords, called tags, became very popular. Tags are a powerful means for organizing and retrieving photos. However, manual tagging is time consuming. Recently, the sheer amount of user-tagged photos available on the Web encouraged researchers to explore new techniques for automatic image annotation. The idea is to annotate an unlabeled image by propagating the labels of community photos that are visually similar to it. Most recently, an ever increasing amount of community photos is also associated with location information, i.e., geotagged. In this thesis, we aim at exploiting the location context and propose an approach for automatically annotating geotagged photos. Our objective is to address the main limitations of state-of-the-art approaches in terms of the quality of the produced tags and the speed of the complete annotation process. To achieve these goals, we, first, deal with the problem of collecting images with the associated metadata from online repositories. Accordingly, we introduce a strategy for data crawling that takes advantage of location information and the social relationships among the contributors of the photos. To improve the quality of the collected user-tags, we present a method for resolving their ambiguity based on tag relatedness information. In this respect, we propose an approach for representing tags as probability distributions based on the algorithm of Laplacian score feature selection. Furthermore, we propose a new metric for calculating the distance between tag probability distributions by extending Jensen-Shannon Divergence to account for statistical fluctuations. To efficiently identify the visual neighbors, the thesis introduces two extensions to the state-of-the-art image matching algorithm, known as Speeded Up Robust Features (SURF). To speed up the matching, we present a solution for reducing the number of compared SURF descriptors based on classification techniques, while the accuracy of SURF is improved through an efficient method for iterative image matching. Furthermore, we propose a statistical model for ranking the mined annotations according to their relevance to the target image. This is achieved by combining multi-modal information in a statistical framework based on Bayes' rule. Finally, the effectiveness of each of mentioned contributions as well as the complete automatic annotation process are evaluated experimentally.Seit der EinfĂŒhrung von Web 2.0 steigt die PopularitĂ€t von Plattformen, auf denen Bilder geteilt und durch die Gemeinschaft mit Schlagwörtern, sogenannten Tags, annotiert werden. Mit Tags lassen sich Fotos leichter organisieren und auffinden. Manuelles Taggen ist allerdings sehr zeitintensiv. Animiert von der schieren Menge an im Web zugĂ€nglichen, von Usern getaggten Fotos, erforschen Wissenschaftler derzeit neue Techniken der automatischen Bildannotation. Dahinter steht die Idee, ein noch nicht beschriftetes Bild auf der Grundlage visuell Ă€hnlicher, bereits beschrifteter Community-Fotos zu annotieren. UnlĂ€ngst wurde eine immer grĂ¶ĂŸere Menge an Community-Fotos mit geographischen Koordinaten versehen (geottagged). Die Arbeit macht sich diesen geographischen Kontext zunutze und prĂ€sentiert einen Ansatz zur automatischen Annotation geogetaggter Fotos. Ziel ist es, die wesentlichen Grenzen der bisher bekannten AnsĂ€tze in Hinsicht auf die QualitĂ€t der produzierten Tags und die Geschwindigkeit des gesamten Annotationsprozesses aufzuzeigen. Um dieses Ziel zu erreichen, wurden zunĂ€chst Bilder mit entsprechenden Metadaten aus den Online-Quellen gesammelt. Darauf basierend, wird eine Strategie zur Datensammlung eingefĂŒhrt, die sich sowohl der geographischen Informationen als auch der sozialen Verbindungen zwischen denjenigen, die die Fotos zur VerfĂŒgung stellen, bedient. Um die QualitĂ€t der gesammelten User-Tags zu verbessern, wird eine Methode zur Auflösung ihrer AmbiguitĂ€t vorgestellt, die auf der Information der Tag-Ähnlichkeiten basiert. In diesem Zusammenhang wird ein Ansatz zur Darstellung von Tags als Wahrscheinlichkeitsverteilungen vorgeschlagen, der auf den Algorithmus der sogenannten Laplacian Score (LS) aufbaut. Des Weiteren wird eine Erweiterung der Jensen-Shannon-Divergence (JSD) vorgestellt, die statistische Fluktuationen berĂŒcksichtigt. Zur effizienten Identifikation der visuellen Nachbarn werden in der Arbeit zwei Erweiterungen des Speeded Up Robust Features (SURF)-Algorithmus vorgestellt. Zur Beschleunigung des Abgleichs wird eine Lösung auf der Basis von Klassifikationstechniken prĂ€sentiert, die die Anzahl der miteinander verglichenen SURF-Deskriptoren minimiert, wĂ€hrend die SURF-Genauigkeit durch eine effiziente Methode des schrittweisen Bildabgleichs verbessert wird. Des Weiteren wird ein statistisches Modell basierend auf der Baye'schen Regel vorgeschlagen, um die erlangten Annotationen entsprechend ihrer Relevanz in Bezug auf das Zielbild zu ranken. Schließlich wird die Effizienz jedes einzelnen, erwĂ€hnten Beitrags experimentell evaluiert. DarĂŒber hinaus wird die Performanz des vorgeschlagenen automatischen Annotationsansatzes durch umfassende experimentelle Studien als Ganzes demonstriert

    Multimedia Annotation Interoperability Framework

    Get PDF
    Multimedia systems typically contain digital documents of mixed media types, which are indexed on the basis of strongly divergent metadata standards. This severely hamplers the inter-operation of such systems. Therefore, machine understanding of metadata comming from different applications is a basic requirement for the inter-operation of distributed Multimedia systems. In this document, we present how interoperability among metadata, vocabularies/ontologies and services is enhanced using Semantic Web technologies. In addition, it provides guidelines for semantic interoperability, illustrated by use cases. Finally, it presents an overview of the most commonly used metadata standards and tools, and provides the general research direction for semantic interoperability using Semantic Web technologies

    Automatic tagging and geotagging in video collections and communities

    Full text link

    Metadata-based access to cultural heritage collections: the RHCe use case

    Get PDF
    More and more cultural heritage organizations see a great opportunity by opening up their collections via the Web to expand their userbase. In this paper we look at our current work in a specific use case, a cultural heritage organization called RHCe that wanted to open up its photo and video archives to the public. We demonstrate in this paper how we can utilize metadata to offer a homogeneous multi-faceted view over their heterogeneous archives. We also discuss what to do if metadata is not available for resources and how we can use a simple mechanism like tagging to still get high quality annotations. We do this by relating the user tags to concepts in an ontology and we discuss some mechanism to do this (semi-) automatically. We also show how these techniques can be used to build a user model and how we can identify the most probable annotations that can be used by domain experts to improve their annotation-time efficiency

    The role of context in image annotation and recommendation

    Get PDF
    With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start)

    Data Mining Algorithms for Internet Data: from Transport to Application Layer

    Get PDF
    Nowadays we live in a data-driven world. Advances in data generation, collection and storage technology have enabled organizations to gather data sets of massive size. Data mining is a discipline that blends traditional data analysis methods with sophisticated algorithms to handle the challenges posed by these new types of data sets. The Internet is a complex and dynamic system with new protocols and applications that arise at a constant pace. All these characteristics designate the Internet a valuable and challenging data source and application domain for a research activity, both looking at Transport layer, analyzing network tra c flows, and going up to Application layer, focusing on the ever-growing next generation web services: blogs, micro-blogs, on-line social networks, photo sharing services and many other applications (e.g., Twitter, Facebook, Flickr, etc.). In this thesis work we focus on the study, design and development of novel algorithms and frameworks to support large scale data mining activities over huge and heterogeneous data volumes, with a particular focus on Internet data as data source and targeting network tra c classification, on-line social network analysis, recommendation systems and cloud services and Big data
    • 

    corecore