120 research outputs found

    Personal Photo Indexing

    Get PDF
    Sorting one’s own private photo collection is a time consuming and tedious task. We demonstrate our event-centered approach to perform this task fully automatically. In the course of the demonstration, we either use our own photo collections, or invite the conference visitors to bring their own cameras and photos. We will sort the photos into a semantically meaningful hierarchy for the users within a couple of minutes. Events as a media aggregator allow a user to manage and annotate a photo collection in more convenient and natural to the human being way. Based on the recognized user behavior the application is able to re- veal the nature of an event and build its hierarchy with a event/sub-event relationship. One important prerequisite of our approach is a precise GPS based spatial annotation of the photos. To accommodate for devices without GPS chips or temporary low GPS perception, we propose an approach to enrich the collection with automatically estimated GPS data by semantically interpolating possible routes of the user. We are positive that we can provide a well received service for the conference visitors, especially since the conference venue will trigger a lot of memorable photos. Large scale experimental validation showed that the approach is able to recreate a user’s desired hierarchy with an F-measure of about 0.8

    TagBook: A Semantic Video Representation without Supervision for Event Detection

    Get PDF
    We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training. For this challenging setting, the prevailing solutions in the literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from existing work, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors. We introduce a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesign it for video event detection by including video source set refinement and varying the video tag assignment. We call our approach TagBook and study its construction, descriptiveness and detection performance on the TRECVID 2013 and 2014 multimedia event detection datasets and the Columbia Consumer Video dataset. Despite its simple nature, the proposed TagBook video representation is remarkably effective for few-example and zero-example event detection, even outperforming very recent state-of-the-art alternatives building on supervised representations.Comment: accepted for publication as a regular paper in the IEEE Transactions on Multimedi

    ICMR 2014: 4th ACM International Conference on Multimedia Retrieval

    Get PDF
    ICMR was initially started as a workshop on challenges in image retrieval (in Newcastle in 1998 ) and later transformed into the Conference on Image and Video Retrieval (CIVR) series. In 2011 the CIVR and the ACM Workshop on Multimedia Information Retrieval were combined into a single conference that now forms the ICMR series. The 4th ACM International Conference on Multimedia Retrieval took place in Glasgow, Scotland, from 1 – 4 April 2014. This was the largest edition of ICMR to date with approximately 170 attendees from 25 different countries. ICMR is one of the premier scientific conference for multimedia retrieval held worldwide, with the stated mission “to illuminate the state of the art in multimedia retrieval by bringing together researchers and practitioners in the field of multimedia retrieval .” According to the Chinese Computing Federation Conference Ranking (2013), ACM ICMR is the number one multimedia retrieval conference worldwide and the number four conference in the category of multimedia and graphics. Although ICMR is about multimedia retrieval, in a wider sense, it is also about automated multimedia understanding. Much of the work in that area involves the analysis of media on a pixel, voxel, and wavelet level, but it also involves innovative retrieval, visualisation and interaction paradigms utilising the nature of the multimedia — be it video, images, speech, or more abstract (sensor) data. The conference aims to promote intellectual exchanges and interactions among scientists, engineers, students, and multimedia researchers in academia as well as industry through various events, including a keynote talk, oral, special and poster sessions focused on re search challenges and solutions, technical and industrial demonstrations of prototypes, tutorials, research, and an industrial panel. In the remainder of this report we will summarise the events that took place at the 4th ACM ICMR conference

    Living Knowledge

    Get PDF
    Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

    Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval

    Get PDF
    In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR). While most SBIR approaches focus on extracting low- and mid-level descriptors for direct feature matching, recent works have shown the benefit of learning coupled feature representations to describe data from two related sources. However, cross-domain representation learning methods are typically cast into non-convex minimization problems that are difficult to optimize, leading to unsatisfactory performance. Inspired by self-paced learning, a learning methodology designed to overcome convergence issues related to local optima by exploiting the samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced partial curriculum learning (CPPCL) framework. Compared with existing self-paced learning methods which only consider a single modality and cannot deal with prior knowledge, CPPCL is specifically designed to assess the learning pace by jointly handling data from dual sources and modality-specific prior information provided in the form of partial curricula. Additionally, thanks to the learned dictionaries, we demonstrate that the proposed CPPCL embeds robust coupled representations for SBIR. Our approach is extensively evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary SBIR and TU-Berlin Extension datasets), showing superior performance over competing SBIR methods

    A Data-Driven Approach for Tag Refinement and Localization in Web Videos

    Get PDF
    Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing. Given a keyframe, our method is able to select on the fly from these visual sources the training exemplars that should be the most relevant for this test sample, and proceeds to transfer labels across similar images. Compared to existing video tagging approaches that require training classifiers for each tag, our system has few parameters, is easy to implement and can deal with an open vocabulary scenario. We demonstrate the approach on tag refinement and localization on DUT-WEBV, a large dataset of web videos, and show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    Instance search with weak geometric correlation consistency

    Get PDF
    Finding object instances from large image collections is a challenging problem with many practical applications. Recent methods inspired by text retrieval achieved good results; however a re-ranking stage based on spatial verification is still required to boost performance. To improve the effectiveness of such instance retrieval systems while avoiding the computational complexity of a re-ranking stage, we explored the geometric correlations among local features and incorporate these correlations with each individual match to form a transformation consistency in rotation and scale space. This weak geometric correlation consistency can be used to effectively eliminate inconsistent feature matches and can be applied to all candidate images at a low computational cost. Experimental results on three standard evaluation benchmarks show that the proposed approach results in a substantial performance improvement compared with recently proposed methods
    corecore