65 research outputs found
The MIR Flickr Retrieval Evaluation Proposal Based on User Tags and Textual Passwords
In most well known image retrieval test sets, the imagery typically cannot be freely distributed or is not representative of a large community of users. In this paper we present a collection for the MIR community comprising 69,000 images from the Flickr website which are redistributable for research purposes and represent a real community of users both in the image content and image tags. We have extracted the tags and EXIF image meta data, and also make all of these publicly available. In addition we discuss several challenges for benchmarking retrieval and classification methods and applications
K-Space Interactive Search
In this paper we will present the K-Space1 Interactive Search system for content-based video information retrieval to be demonstrated in the VideOlympics. This system is an exten-sion of the system we developed as part of our participation in TRECVID 2007 [1]. In TRECVID 2007 we created two interfaces, known as the ‘Shot’ based and ‘Broadcast’ based interfaces. Our VideOlympics submission takes these two in-terfaces and the lessons learned from our user experiments, to create a single user interface which attempts to leverage the best aspects of both
Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces
Semantic spaces encode similarity relationships between objects as a function of position in a mathematical space. This paper discusses three different formulations for building semantic spaces which allow the automatic-annotation and semantic retrieval of images. The models discussed in this paper require that the image content be described in the form of a series of visual-terms, rather than as a continuous feature-vector. The paper also discusses how these term-based models compare to the latest state-of-the-art continuous feature models for auto-annotation and retrieval
New trends and ideas in visual concept detection
The MIR Flickr collection consists of 25000 high-quality photographic images of thousands of Flickr users, made available under the Creative Commons license. The database includes all the original user tags and EXIF metadata. Additionally, detailed and accurate annotations are provided for topics corresponding to the most prominent visual concepts in the user tag data. The rich metadata allow for a wide variety of image retrieval benchmarking scenarios. In this paper, we provide an overview of the various strategies that were devised for automatic visual concept detection using the MIR Flickr collection. In particular we discuss results from various experiments in combining social data and low-level content-based descriptors to improve the accuracy of visual concept classifiers. Additionally, we present retrieval result
Empirical study of multi-label classification methods for image annotation and retrieval
This paper presents an empirical study of multi-label classification methods, and gives suggestions for multi-label classification that are effective for automatic image annotation applications. The study shows that triple random ensemble multi-label classification algorithm (TREMLC) outperforms among its counterparts, especially on scene image dataset. Multi-label k-nearest neighbor (ML-kNN) and binary relevance (BR) learning algorithms perform well on Corel image dataset. Based on the overall evaluation results, examples are given to show label prediction performance for the algorithms using selected image examples. This provides an indication of the suitability of different multi-label classification methods for automatic image annotation under different problem settings.<br /
VRLE: Lifelog Interaction Prototype in Virtual Reality:Lifelog Search Challenge at ACM ICMR 2020
The Lifelog Search Challenge (LSC) invites researchers to share
their prototypes for interactive lifelog retrieval and encourages
competition to develop and evaluate effective methodologies to
achieve this. With this paper we present a novel approach to visual
lifelog exploration based on our research to date utilising virtual
reality as a medium for interactive information retrieval. The VRLE
prototype presented is an iteration on a previous system which
won the first LSC competition at ACM ICMR 2018
Ten Research Questions for Scalable Multimedia Analytics
International audienceThe scale and complexity of multimedia collections is ever increasing, as is the desire to harvest useful insight from the collections. To optimally support the complex quest for insight, multimedia ana-lytics has emerged as a new research area that combines concepts and techniques from multimedia analysis and visual analytics into a single framework. State of the art multimedia analytics solutions are highly interactive and give users freedom in how they perform their analytics task, but they do not scale well. State of the art scalable database management solutions, on the other hand, are not yet designed for multimedia analytics workloads. In this position paper we therefore argue the need for research on scalable multimedia analytics, a new research area built on the three pillars of visual analytics, multimedia analysis and database management. We propose a specific goal for scalable multimedia analyt-ics and present several important research questions that we believe must be addressed in order to achieve that goal
Everyday concept detection in visual lifelogs: validation, relationships and trends
The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept. We conduct further analysis on the temporal consistency, co-occurance and trends within the detected concepts to more extensively investigate the robustness of the detectors within this novel domain. We additionally present future applications of concept detection within the domain of lifelogging
Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations
This paper explores a new natural language processing task, review-driven
multi-label music style classification. This task requires the system to
identify multiple styles of music based on its reviews on websites. The biggest
challenge lies in the complicated relations of music styles. It has brought
failure to many multi-label classification methods. To tackle this problem, we
propose a novel deep learning approach to automatically learn and exploit style
correlations. The proposed method consists of two parts: a label-graph based
neural network, and a soft training mechanism with correlation-based continuous
label representation. Experimental results show that our approach achieves
large improvements over the baselines on the proposed dataset. Especially, the
micro F1 is improved from 53.9 to 64.5, and the one-error is reduced from 30.5
to 22.6. Furthermore, the visualized analysis shows that our approach performs
well in capturing style correlations
- …