766 research outputs found
Recommended from our members
A words-of-interest model of sketch representation for image retrieval
In this paper we propose a method for sketch-based image retrieval. Sketch is a magical medium which is capable of conveying semantic messages for user. Itâs in accordance with userâs cognitive psychology to retrieve images with sketch. In order to narrow down the semantic gap between the user and the images in database, we preprocess all the images into sketches by the coherent line drawing algorithm. During the process of sketches extraction, saliency maps are used to filter out the redundant background information, while preserve the important semantic information. We use a variant of Words-of-Interest model to retrieve relevant images for the user according to the query. Words-of-Interest (WoI) model is based on Bag-ofvisual Words (BoW) model, which has been proven successfully for information retrieval. Bag-of-Words ignores the spatial relationships among visual words, which are important for sketch representation. Our method takes advantage of the spatial information of the query to select words of interest. Experimental results demonstrate that our sketch-based retrieval method achieves a good tradeoff between retrieval accuracy and semantic representation of usersâ query
Fine-Grained Video Retrieval With Scene Sketches
Benefiting from the intuitiveness and naturalness of sketch interaction, sketch-based video retrieval (SBVR) has received considerable attention in the video retrieval research area. However, most existing SBVR research still lacks the capability of accurate video retrieval with fine-grained scene content. To address this problem, in this paper we investigate a new task, which focuses on retrieving the target video by utilizing a fine-grained storyboard sketch depicting the scene layout and major foreground instancesâ visual characteristics (e.g., appearance, size, pose, etc.) of video; we call such a task âfine-grained scene-level SBVRâ. The most challenging issue in this task is how to perform scene-level cross-modal alignment between sketch and video. Our solution consists of two parts. First, we construct a scene-level sketch-video dataset called SketchVideo, in which sketch-video pairs are provided and each pair contains a clip-level storyboard sketch and several keyframe sketches (corresponding to video frames). Second, we propose a novel deep learning architecture called Sketch Query Graph Convolutional Network (SQ-GCN). In SQ-GCN, we first adaptively sample the video frames to improve video encoding efficiency, and then construct appearance and category graphs to jointly model visual and semantic alignment between sketch and video. Experiments show that our fine-grained scene-level SBVR framework with SQ-GCN architecture outperforms the state-of-the-art fine-grained retrieval methods. The SketchVideo dataset and SQ-GCN code are available in the project webpage https://iscas-mmsketch.github.io/FG-SL-SBVR/
Recommended from our members
JuxtaLearn D3.2 Performance Framework
This deliverable, D3.2, for Work Package 3 incorporating the pedagogy from WP2 and orchestration factors mapped in D3.1 reviews aspects of performance in the context of participative video making. It reviews literature on curiosity and engagement characteristics of interaction mechanisms for public displays and anticipates requirements for social network analysis of relevant public videos from WP6 task 6.3. Thus, to support JuxtaLearn performance it proposes a reflective performance framework that encompasses the material environment and objects required, the participants, and the knowledge needed
From Personalization to Adaptivity: Creating Immersive Visits through Interactive Digital Storytelling at the Acropolis Museum
Storytelling has recently become a popular way to guide museum visitors, replacing traditional exhibit-centric descriptions by story-centric cohesive narrations with references to the exhibits and multimedia content. This work presents the fundamental elements of the CHESS project approach, the goal of which is to provide adaptive, personalized, interactive storytelling for museum visits. We shortly present the CHESS project and its background, we detail the proposed storytelling and user models, we describe the provided functionality and we outline the main tools and mechanisms employed. Finally, we present the preliminary results of a recent evaluation study that are informing several directions for future work
How can I produce a digital video artefact to facilitate greater understanding among youth workers of their own learning-to-learn competence?
In Ireland, youth work is delivered largely in marginalised communities and through non-formal and informal learning methods. Youth workers operate in small isolated organisations without many of the resources and structures to improve practice that is afforded to larger formal educational establishments. Fundamental to youth work practice is the ability to identify and construct learning experiences for young people in non-traditional learning environments. It is therefore necessary for youth workers to develop a clear understanding of their own learning capacity in order to facilitate learning experiences for young people.
In the course of this research, I attempted to use technology to enhance and support the awareness among youth workers of their own learning capacity by creating a digital video artifact that explores the concept â learning-to-learn. This study presents my understanding of the learning-to-learn competence as, I sought to improve my practice as a youth service manager and youth work trainer.
This study was conducted using an action research approach. I designed and evaluated the digital media artifact â âLennyâs Questâ in collaboration with staff and trainer colleagues in the course of two cycles of action research, and my research was critiqued and validated throughout this process
Compositional Sketch Search
We present an algorithm for searching image collections using free-hand
sketches that describe the appearance and relative positions of multiple
objects. Sketch based image retrieval (SBIR) methods predominantly match
queries containing a single, dominant object invariant to its position within
an image. Our work exploits drawings as a concise and intuitive representation
for specifying entire scene compositions. We train a convolutional neural
network (CNN) to encode masked visual features from sketched objects, pooling
these into a spatial descriptor encoding the spatial relationships and
appearances of objects in the composition. Training the CNN backbone as a
Siamese network under triplet loss yields a metric search embedding for
measuring compositional similarity which may be efficiently leveraged for
visual search by applying product quantization.Comment: ICIP 2021 camera-ready versio
- âŠ