63 research outputs found

    Suchbasierte automatische Bildannotation anhand geokodierter Community-Fotos

    Get PDF
    In the Web 2.0 era, platforms for sharing and collaboratively annotating images with keywords, called tags, became very popular. Tags are a powerful means for organizing and retrieving photos. However, manual tagging is time consuming. Recently, the sheer amount of user-tagged photos available on the Web encouraged researchers to explore new techniques for automatic image annotation. The idea is to annotate an unlabeled image by propagating the labels of community photos that are visually similar to it. Most recently, an ever increasing amount of community photos is also associated with location information, i.e., geotagged. In this thesis, we aim at exploiting the location context and propose an approach for automatically annotating geotagged photos. Our objective is to address the main limitations of state-of-the-art approaches in terms of the quality of the produced tags and the speed of the complete annotation process. To achieve these goals, we, first, deal with the problem of collecting images with the associated metadata from online repositories. Accordingly, we introduce a strategy for data crawling that takes advantage of location information and the social relationships among the contributors of the photos. To improve the quality of the collected user-tags, we present a method for resolving their ambiguity based on tag relatedness information. In this respect, we propose an approach for representing tags as probability distributions based on the algorithm of Laplacian score feature selection. Furthermore, we propose a new metric for calculating the distance between tag probability distributions by extending Jensen-Shannon Divergence to account for statistical fluctuations. To efficiently identify the visual neighbors, the thesis introduces two extensions to the state-of-the-art image matching algorithm, known as Speeded Up Robust Features (SURF). To speed up the matching, we present a solution for reducing the number of compared SURF descriptors based on classification techniques, while the accuracy of SURF is improved through an efficient method for iterative image matching. Furthermore, we propose a statistical model for ranking the mined annotations according to their relevance to the target image. This is achieved by combining multi-modal information in a statistical framework based on Bayes' rule. Finally, the effectiveness of each of mentioned contributions as well as the complete automatic annotation process are evaluated experimentally.Seit der EinfĂŒhrung von Web 2.0 steigt die PopularitĂ€t von Plattformen, auf denen Bilder geteilt und durch die Gemeinschaft mit Schlagwörtern, sogenannten Tags, annotiert werden. Mit Tags lassen sich Fotos leichter organisieren und auffinden. Manuelles Taggen ist allerdings sehr zeitintensiv. Animiert von der schieren Menge an im Web zugĂ€nglichen, von Usern getaggten Fotos, erforschen Wissenschaftler derzeit neue Techniken der automatischen Bildannotation. Dahinter steht die Idee, ein noch nicht beschriftetes Bild auf der Grundlage visuell Ă€hnlicher, bereits beschrifteter Community-Fotos zu annotieren. UnlĂ€ngst wurde eine immer grĂ¶ĂŸere Menge an Community-Fotos mit geographischen Koordinaten versehen (geottagged). Die Arbeit macht sich diesen geographischen Kontext zunutze und prĂ€sentiert einen Ansatz zur automatischen Annotation geogetaggter Fotos. Ziel ist es, die wesentlichen Grenzen der bisher bekannten AnsĂ€tze in Hinsicht auf die QualitĂ€t der produzierten Tags und die Geschwindigkeit des gesamten Annotationsprozesses aufzuzeigen. Um dieses Ziel zu erreichen, wurden zunĂ€chst Bilder mit entsprechenden Metadaten aus den Online-Quellen gesammelt. Darauf basierend, wird eine Strategie zur Datensammlung eingefĂŒhrt, die sich sowohl der geographischen Informationen als auch der sozialen Verbindungen zwischen denjenigen, die die Fotos zur VerfĂŒgung stellen, bedient. Um die QualitĂ€t der gesammelten User-Tags zu verbessern, wird eine Methode zur Auflösung ihrer AmbiguitĂ€t vorgestellt, die auf der Information der Tag-Ähnlichkeiten basiert. In diesem Zusammenhang wird ein Ansatz zur Darstellung von Tags als Wahrscheinlichkeitsverteilungen vorgeschlagen, der auf den Algorithmus der sogenannten Laplacian Score (LS) aufbaut. Des Weiteren wird eine Erweiterung der Jensen-Shannon-Divergence (JSD) vorgestellt, die statistische Fluktuationen berĂŒcksichtigt. Zur effizienten Identifikation der visuellen Nachbarn werden in der Arbeit zwei Erweiterungen des Speeded Up Robust Features (SURF)-Algorithmus vorgestellt. Zur Beschleunigung des Abgleichs wird eine Lösung auf der Basis von Klassifikationstechniken prĂ€sentiert, die die Anzahl der miteinander verglichenen SURF-Deskriptoren minimiert, wĂ€hrend die SURF-Genauigkeit durch eine effiziente Methode des schrittweisen Bildabgleichs verbessert wird. Des Weiteren wird ein statistisches Modell basierend auf der Baye'schen Regel vorgeschlagen, um die erlangten Annotationen entsprechend ihrer Relevanz in Bezug auf das Zielbild zu ranken. Schließlich wird die Effizienz jedes einzelnen, erwĂ€hnten Beitrags experimentell evaluiert. DarĂŒber hinaus wird die Performanz des vorgeschlagenen automatischen Annotationsansatzes durch umfassende experimentelle Studien als Ganzes demonstriert

    Semantic Interaction in Web-based Retrieval Systems : Adopting Semantic Web Technologies and Social Networking Paradigms for Interacting with Semi-structured Web Data

    Get PDF
    Existing web retrieval models for exploration and interaction with web data do not take into account semantic information, nor do they allow for new forms of interaction by employing meaningful interaction and navigation metaphors in 2D/3D. This thesis researches means for introducing a semantic dimension into the search and exploration process of web content to enable a significantly positive user experience. Therefore, an inherently dynamic view beyond single concepts and models from semantic information processing, information extraction and human-machine interaction is adopted. Essential tasks for semantic interaction such as semantic annotation, semantic mediation and semantic human-computer interaction were identified and elaborated for two general application scenarios in web retrieval: Web-based Question Answering in a knowledge-based dialogue system and semantic exploration of information spaces in 2D/3D

    Semantic Annotation of Digital Objects by Multiagent Computing: Applications in Digital Heritage

    Get PDF
    Heritage organisations around the world are participating in broad scale digitisation projects, where traditional forms of heritage materials are being transcribed into digital representations in order to assist with their long-term preservation, facilitate cataloguing, and increase their accessibility to researchers and the general public. These digital formats open up a new world of opportunities for applying computational information retrieval techniques to heritage collections, making it easier than ever before to explore and document these materials. One of the key benefits of being able to easily share digital heritage collections is the strengthening and support of community memory, where members of a community contribute their perceptions and recollections of historical and cultural events so that this knowledge is not forgotten and lost over time. With the ever-growing popularity of digitally-native media and the high level of computer literacy in modern society, this is set to become a critical area for preservation in the immediate future

    Generation of Classificatory Metadata for Web Resources using Social Tags

    Get PDF
    With the increasing popularity of social tagging systems, the potential for using social tags as a source of metadata is being explored. Social tagging systems can simplify the involvement of a large number of users and improve the metadata generation process, especially for semantic metadata. This research aims to find a method to categorize web resources using social tags as metadata. In this research, social tagging systems are a mechanism to allow non-professional catalogers to participate in metadata generation. Because social tags are not from a controlled vocabulary, there are issues that have to be addressed in finding quality terms to represent the content of a resource. This research examines ways to deal with those issues to obtain a set of tags representing the resource from the tags provided by users.Two measurements that measure the importance of a tag are introduced. Annotation Dominance (AD) is a measurement of how much a tag term is agreed to by users. Another is Cross Resources Annotation Discrimination (CRAD), a measurement to discriminate tags in the collection. It is designed to remove tags that are used broadly or narrowly in the collection. Further, the study suggests a process to identify and to manage compound tags. The research aims to select important annotations (meta-terms) and remove meaningless ones (noise) from the tag set. This study, therefore, suggests two main measurements for getting a subset of tags with classification potential. To evaluate the proposed approach to find classificatory metadata candidates, we rely on users' relevance judgments comparing suggested tag terms and expert metadata terms. Human judges rate how relevant each term is on an n-point scale based on the relevance of each of the terms for the given resource

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Crowdsourced intuitive visual design feedback

    Get PDF
    For many people images are a medium preferable to text and yet, with the exception of star ratings, most formats for conventional computer mediated feedback focus on text. This thesis develops a new method of crowd feedback for designers based on images. Visual summaries are generated from a crowd’s feedback images chosen in response to a design. The summaries provide the designer with impressionistic and inspiring visual feedback. The thesis sets out the motivation for this new method, describes the development of perceptually organised image sets and a summarisation algorithm to implement it. Evaluation studies are reported which, through a mixed methods approach, provide evidence of the validity and potential of the new image-based feedback method. It is concluded that the visual feedback method would be more appealing than text for that section of the population who may be of a visual cognitive style. Indeed the evaluation studies are evidence that such users believe images are as good as text when communicating their emotional reaction about a design. Designer participants reported being inspired by the visual feedback where, comparably, they were not inspired by text. They also reported that the feedback can represent the perceived mood in their designs, and that they would be enthusiastic users of a service offering this new form of visual design feedback

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary
    • 

    corecore