65 research outputs found

    The MIR Flickr Retrieval Evaluation Proposal Based on User Tags and Textual Passwords

    Full text link
    In most well known image retrieval test sets, the imagery typically cannot be freely distributed or is not representative of a large community of users. In this paper we present a collection for the MIR community comprising 69,000 images from the Flickr website which are redistributable for research purposes and represent a real community of users both in the image content and image tags. We have extracted the tags and EXIF image meta data, and also make all of these publicly available. In addition we discuss several challenges for benchmarking retrieval and classification methods and applications

    K-Space Interactive Search

    Get PDF
    In this paper we will present the K-Space1 Interactive Search system for content-based video information retrieval to be demonstrated in the VideOlympics. This system is an exten-sion of the system we developed as part of our participation in TRECVID 2007 [1]. In TRECVID 2007 we created two interfaces, known as the ‘Shot’ based and ‘Broadcast’ based interfaces. Our VideOlympics submission takes these two in-terfaces and the lessons learned from our user experiments, to create a single user interface which attempts to leverage the best aspects of both

    Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces

    No full text
    Semantic spaces encode similarity relationships between objects as a function of position in a mathematical space. This paper discusses three different formulations for building semantic spaces which allow the automatic-annotation and semantic retrieval of images. The models discussed in this paper require that the image content be described in the form of a series of visual-terms, rather than as a continuous feature-vector. The paper also discusses how these term-based models compare to the latest state-of-the-art continuous feature models for auto-annotation and retrieval

    New trends and ideas in visual concept detection

    Full text link
    The MIR Flickr collection consists of 25000 high-quality photographic images of thousands of Flickr users, made available under the Creative Commons license. The database includes all the original user tags and EXIF metadata. Additionally, detailed and accurate annotations are provided for topics corresponding to the most prominent visual concepts in the user tag data. The rich metadata allow for a wide variety of image retrieval benchmarking scenarios. In this paper, we provide an overview of the various strategies that were devised for automatic visual concept detection using the MIR Flickr collection. In particular we discuss results from various experiments in combining social data and low-level content-based descriptors to improve the accuracy of visual concept classifiers. Additionally, we present retrieval result

    Empirical study of multi-label classification methods for image annotation and retrieval

    Full text link
    This paper presents an empirical study of multi-label classification methods, and gives suggestions for multi-label classification that are effective for automatic image annotation applications. The study shows that triple random ensemble multi-label classification algorithm (TREMLC) outperforms among its counterparts, especially on scene image dataset. Multi-label k-nearest neighbor (ML-kNN) and binary relevance (BR) learning algorithms perform well on Corel image dataset. Based on the overall evaluation results, examples are given to show label prediction performance for the algorithms using selected image examples. This provides an indication of the suitability of different multi-label classification methods for automatic image annotation under different problem settings.<br /

    VRLE: Lifelog Interaction Prototype in Virtual Reality:Lifelog Search Challenge at ACM ICMR 2020

    Get PDF
    The Lifelog Search Challenge (LSC) invites researchers to share their prototypes for interactive lifelog retrieval and encourages competition to develop and evaluate effective methodologies to achieve this. With this paper we present a novel approach to visual lifelog exploration based on our research to date utilising virtual reality as a medium for interactive information retrieval. The VRLE prototype presented is an iteration on a previous system which won the first LSC competition at ACM ICMR 2018

    Ten Research Questions for Scalable Multimedia Analytics

    Get PDF
    International audienceThe scale and complexity of multimedia collections is ever increasing, as is the desire to harvest useful insight from the collections. To optimally support the complex quest for insight, multimedia ana-lytics has emerged as a new research area that combines concepts and techniques from multimedia analysis and visual analytics into a single framework. State of the art multimedia analytics solutions are highly interactive and give users freedom in how they perform their analytics task, but they do not scale well. State of the art scalable database management solutions, on the other hand, are not yet designed for multimedia analytics workloads. In this position paper we therefore argue the need for research on scalable multimedia analytics, a new research area built on the three pillars of visual analytics, multimedia analysis and database management. We propose a specific goal for scalable multimedia analyt-ics and present several important research questions that we believe must be addressed in order to achieve that goal

    Everyday concept detection in visual lifelogs: validation, relationships and trends

    Get PDF
    The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept. We conduct further analysis on the temporal consistency, co-occurance and trends within the detected concepts to more extensively investigate the robustness of the detectors within this novel domain. We additionally present future applications of concept detection within the domain of lifelogging

    Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations

    Full text link
    This paper explores a new natural language processing task, review-driven multi-label music style classification. This task requires the system to identify multiple styles of music based on its reviews on websites. The biggest challenge lies in the complicated relations of music styles. It has brought failure to many multi-label classification methods. To tackle this problem, we propose a novel deep learning approach to automatically learn and exploit style correlations. The proposed method consists of two parts: a label-graph based neural network, and a soft training mechanism with correlation-based continuous label representation. Experimental results show that our approach achieves large improvements over the baselines on the proposed dataset. Especially, the micro F1 is improved from 53.9 to 64.5, and the one-error is reduced from 30.5 to 22.6. Furthermore, the visualized analysis shows that our approach performs well in capturing style correlations
    corecore