Search CORE

5 research outputs found

A Data-Driven Approach for Tag Refinement and Localization in Web Videos

Author: Ballan Lamberto
Bertini Marco
Del Bimbo Alberto
Serra Giuseppe
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing. Given a keyframe, our method is able to select on the fly from these visual sources the training exemplars that should be the most relevant for this test sample, and proceeds to transfer labels across similar images. Compared to existing video tagging approaches that require training classifiers for each tag, our system has few parameters, is easy to implement and can deal with an open vocabulary scenario. We demonstrate the approach on tag refinement and localization on DUT-WEBV, a large dataset of web videos, and show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Florence Research

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Archivio istituzionale della ricerca - Università di Padova

Video Content Understanding Using Text

Author: Mazaheri Amir
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

The rise of the social media and video streaming industry provided us a plethora of videos and their corresponding descriptive information in the form of concepts (words) and textual video captions. Due to the mass amount of available videos and the textual data, today is the best time ever to study the Computer Vision and Machine Learning problems related to videos and text. In this dissertation, we tackle multiple problems associated with the joint understanding of videos and text. We first address the task of multi-concept video retrieval, where the input is a set of words as concepts, and the output is a ranked list of full-length videos. This approach deals with multi-concept input and prolonged length of videos by incorporating multi-latent variables to tie the information within each shot (short clip of a full-video) and across shots. Secondly, we address the problem of video question answering, in which, the task is to answer a question, in the form of Fill-In-the-Blank (FIB), given a video. Answering a question is a task of retrieving a word from a dictionary (all possible words suitable for an answer) based on the input question and video. Following the FIB problem, we introduce a new problem, called Visual Text Correction (VTC), i.e., detecting and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence while benefiting 1D-CNNs/LSTMs to encode short/long term dependencies, and fix it by replacing the inaccurate word(s). Finally, as the last part of the dissertation, we propose to tackle the problem of video generation using user input natural language sentences. Our proposed video generation method constructs two distributions out of the input text, corresponding to the first and last frames latent representations. We generate high-fidelity videos by interpolating latent representations and a sequence of CNN based up-pooling blocks

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Efficient Genre-Specific Semantic Video Indexing

Author: Jun Wu
Marcel Worring
Publication venue
Publication date: 01/01/2012
Field of study

Abstract—Large video collections such as YouTube contain many different video genres, while in many applications the user mightbeinterestedinoneortwospecificvideo genres only. Thus, when users are querying the system with a specific semantic concept like AnchorPerson,andMovieStars, they are likely aiming a genre specific instantiation of this concept. Existing methods treat this problem as a classical learning problem leading to unnecessarily complex models. We propose a framework to detect visual-based genre-specific conceptsinamoreefficient and accurate way. We do so by using a two-step framework distinguishing two different levels. Genre-specific concept models are trained based on a training set with data labeled at video level for genres and at shot level for semantic concepts. In the classification stage, video genre classification is applied first to reduce the entire data set to a relatively small subset. Then, the genre-specific concept models are applied to this subset only. Experiments have been conducted on a small 28-h data set for genre-specific concept detection and a 4168-h (80 031 videos) benchmark data set for genre-specific topic search. Experimental results show that our proposed two-step method is more efficient and effective than existing methods which do not consider the different semantic levels between video genres and semantic concepts for both the indexing and the search tasks. When filtering out 80 % of the data set, the average performance loss is about 11.3 % for genre-specific concept detection and 31.5 % for genre-specific topic search, while the processing speed increases hundreds of times for different video genres. Index Terms—Efficiency, genre classification, genre-specificconcept detection, semantic indexing

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Efficient genre-specific semantic video indexing

Author: Worring M.
Wu J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

International Migration, Integration and Social Cohesion online publications

Efficient Genre-Specific Semantic Video Indexing

Author: Jun Wu
Marcel Worring
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref