40 research outputs found

    Exquisitor at the Lifelog Search Challenge 2020

    Get PDF
    We present an enhanced version of Exquisitor, our interactive and scalable media exploration system. At its core, Exquisitor is an interactive learning system using relevance feedback on media items to build a model of the users' information need. Relying on efficient media representation and indexing, it facilitates real-time user interaction. The new features for the Lifelog Search Challenge 2020 include support for timeline browsing, search functionality for finding positive examples, and significant interface improvements. Participation in the Lifelog Search Challenge allows us to compare our paradigm, relying predominantly on interactive learning, with more traditional search-based multimedia retrieval systems

    A benchmark of visual storytelling in social media

    Get PDF
    CMUP-ERI/TIC/0046/2014Media editors in the newsroom are constantly pressed to provide a "like-being there" coverage of live events. Social media provides a disorganised collection of images and videos that media professionals need to grasp before publishing their latest news updated. Automated news visual storyline editing with social media content can be very challenging, as it not only entails the task of finding the right content but also making sure that news content evolves coherently over time. To tackle these issues, this paper proposes a benchmark for assessing social media visual storylines. The SocialStories benchmark, comprised by total of 40 curated stories covering sports and cultural events, provides the experimental setup and introduces novel quantitative metrics to perform a rigorous evaluation of visual storytelling with social media data.publishersversionpublishe

    Context-Aware Embeddings for Automatic Art Analysis

    Full text link
    Automatic art analysis aims to classify and retrieve artistic representations from a collection of images by using computer vision and machine learning techniques. In this work, we propose to enhance visual representations from neural networks with contextual artistic information. Whereas visual representations are able to capture information about the content and the style of an artwork, our proposed context-aware embeddings additionally encode relationships between different artistic attributes, such as author, school, or historical period. We design two different approaches for using context in automatic art analysis. In the first one, contextual data is obtained through a multi-task learning model, in which several attributes are trained together to find visual relationships between elements. In the second approach, context is obtained through an art-specific knowledge graph, which encodes relationships between artistic attributes. An exhaustive evaluation of both of our models in several art analysis problems, such as author identification, type classification, or cross-modal retrieval, show that performance is improved by up to 7.3% in art classification and 37.24% in retrieval when context-aware embeddings are used

    Relationship detection based on object semantic inference and attention mechanisms

    Get PDF
    Detecting relations among objects is a crucial task for image understanding. However, each relationship involves different objects pair combinations, and different objects pair combinations express diverse interactions. This makes the relationships, based just on visual features, a challenging task. In this paper, we propose a simple yet effective relationship detection model, which is based on object semantic inference and attention mechanisms. Our model is trained to detect relation triples, such as , . To overcome the high diversity of visual appearances, the semantic inference module and the visual features are combined to complement each others. We also introduce two different attention mechanisms for object feature refinement and phrase feature refinement. In order to derive a more detailed and comprehensive representation for each object, the object feature refinement module refines the representation of each object by querying over all the other objects in the image. The phrase feature refinement module is proposed in order to make the phrase feature more effective, and to automatically focus on relative parts, to improve the visual relationship detection task. We validate our model on Visual Genome Relationship dataset. Our proposed model achieves competitive results compared to the state-of-the-art method MOTIFNET

    Exploring Intuitive Lifelog Retrieval and Interaction Modes in Virtual Reality with vitrivr-VR

    Get PDF
    The multimodal nature of lifelog data collections poses unique challenges for multimedia management and retrieval systems. The Lifelog Search Challenge (LSC) offers an annual evaluation platform for such interactive retrieval systems. They compete against one another in finding items of interest within a set time frame. In this paper, we present the multimedia retrieval system vitrivr-vr, the latest addition to the vitrivr stack, which participated in the LSC in recent years. vitrivr-vr leverages the 3D space in virtual reality (VR) to offer novel retrieval and user interaction models, which we describe with a special focus on design decisions taken for the participation in the LSC

    Exquisitor:Interactive Learning for Multimedia

    Get PDF

    FIRST - Flexible interactive retrieval SysTem for visual lifelog exploration at LSC 2020

    Get PDF
    Lifelog can provide useful insights of our daily activities. It is essential to provide a flexible way for users to retrieve certain events or moments of interest, corresponding to a wide variation of query types. This motivates us to develop FIRST, a Flexible Interactive Retrieval SysTem, to help users to combine or integrate various query components in a flexible manner to handle different query scenarios, such as visual clustering data based on color histogram, visual similarity, GPS location, or scene attributes. We also employ personalized concept detection and image captioning to enhance image understanding from visual lifelog data, and develop an autoencoderlike approach for query text and image feature mapping. Furthermore, we refine the user interface of the retrieval system to better assist users in query expansion and verifying sequential events in a flexible temporal resolution to control the navigation speed through sequences of images

    Spatially Localised Immersive Contemporary and Historic Photo Presentation on Mobile Devices in Augmented Reality

    Full text link
    These days, taking a photo is the most common way of capturing a moment. Some of these photos captured in the moment are never to be seen again. Others are almost immediately shared with the world. Yet, the context of the captured moment can only be shared to a limited extent. The continuous improvement of mobile devices has not only led to higher resolution cameras and, thus, visually more appealing pictures but also to a broader and more precise range of accompanying sensor metadata. Positional and bearing information can provide context for photos and is thus an integral aspect of the captured moment. However, it is commonly only used to sort photos by time and possibly group by place. Such more precise sensor metadata, combined with the increased computing power of mobile devices, can enable more and more powerful Augmented Reality (AR) capabilities, especially for communicating the context of a captured photo. Users can thereby witness the captured moment in its real location and also experience its spatial contextualization. With the help of a suitable data augmentation, such context-preserving presentation can be extended even to non-digitally born content, including historical images. This offers new immersive ways to experience the cultural history of one's current location. In this paper, we present an approach for location-based image presentation in AR on mobile devices. With this approach, users can experience captured moments in their physical context. We demonstrate the power of this approach based on a prototype implementation and evaluate it in a user study
    corecore