Search CORE

573 research outputs found

A Video Library System Using Scene Detection and Automatic Tagging

Author: Baraldi Lorenzo
Cucchiara Rita
Grana Costantino
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We present a novel video browsing and retrieval system for edited videos, in which videos are automatically decomposed into meaningful and storytelling parts (i.e. scenes) and tagged according to their transcript. The system relies on a Triplet Deep Neural Network which exploits multimodal features, and has been implemented as a set of extensions to the eXo Platform Enterprise Content Management System (ECMS). This set of extensions enable the interactive visualization of a video, its automatic and semi-automatic annotation, as well as a keyword-based search inside the video collection. The platform also allows a natural integration with third-party add-ons, so that automatic annotations can be exploited outside the proposed platform

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

CONTENT BASED RETRIEVAL OF LECTURE VIDEO REPOSITORY: LITERATURE REVIEW

Author: Adrakatti Arun
Mulla K R, Dr.
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 02/07/2022
Field of study

Multimedia has a significant role in communicating the information and a large amount of multimedia repositories make the browsing, retrieval and delivery of video contents. For higher education, using video as a tool for learning and teaching through multimedia application is a considerable promise. Many universities adopt educational systems where the teacher lecture is video recorded and the video lecture is made available to students with minimum post-processing effort. Since each video may cover many subjects, it is critical for an e-Learning environment to have content-based video searching capabilities to meet diverse individual learning needs. The present paper reviewed 120+ core research article on the content based retrieval of the lecture video repositories hosted on cloud by government academic and research organization of India

DigitalCommons@University of Nebraska

Video Augmentation in Education: in-context support for learners through prerequisite graphs

Author: GALLUCCIO ILENIA
Publication venue: Università degli studi di Genova
Publication date: 29/05/2023
Field of study

The field of education is experiencing a massive digitisation process that has been ongoing for the past decade. The role played by distance learning and Video-Based Learning, which is even more reinforced by the pandemic crisis, has become an established reality. However, the typical features of video consumption, such as sequential viewing and viewing time proportional to duration, often lead to sub-optimal conditions for the use of video lessons in the process of acquisition, retrieval and consolidation of learning contents. Video augmentation can prove to be an effective support to learners, allowing a more flexible exploration of contents, a better understanding of concepts and relationships between concepts and an optimization of time required for video consumption at different stages of the learning process. This thesis focuses therefore on the study of methods for: 1) enhancing video capabilities through video augmentation features; 2) extracting concept and relationships from video materials; 3) developing intelligent user interfaces based on the knowledge extracted. The main research goal is to understand to what extent video augmentation can improve the learning experience. This research goal inspired the design of EDURELL Framework, within which two applications were developed to enable the testing of augmented methods and their provision. The novelty of this work lies in using the knowledge within the video, without exploiting external materials, to exploit its educational potential. The enhancement of the user interface takes place through various support features among which in particular a map that progressively highlights the prerequisite relationships between the concepts as they are explained, i.e., following the advancement of the video. The proposed approach has been designed following a user-centered iterative approach and the results in terms of effect and impact on video comprehension and learning experience make a contribution to the research in this field

Archivio istituzionale della ricerca - Università di Genova

Annotation of multimedia learning materials for semantic search

Author: Rajgure Sheetal
Publication venue: Digital Commons @ NJIT
Publication date: 01/10/2017
Field of study

Multimedia is the main source for online learning materials, such as videos, slides and textbooks, and its size is growing with the popularity of online programs offered by Universities and Massive Open Online Courses (MOOCs). The increasing amount of multimedia learning resources available online makes it very challenging to browse through the materials or find where a specific concept of interest is covered. To enable semantic search on the lecture materials, their content must be annotated and indexed. Manual annotation of learning materials such as videos is tedious and cannot be envisioned for the growing quantity of online materials. One of the most commonly used methods for learning video annotation is to index the video, based on the transcript obtained from translating the audio track of the video into text. Existing speech to text translators require extensive training especially for non-native English speakers and are known to have low accuracy. This dissertation proposes to index the slides, based on the keywords. The keywords extracted from the textbook index and the presentation slides are the basis of the indexing scheme. Two types of lecture videos are generally used (i.e., classroom recording using a regular camera or slide presentation screen captures using specific software) and their quality varies widely. The screen capture videos, have generally a good quality and sometimes come with metadata. But often, metadata is not reliable and hence image processing techniques are used to segment the videos. Since the learning videos have a static background of slide, it is challenging to detect the shot boundaries. Comparative analysis of the state of the art techniques to determine best feature descriptors suitable for detecting transitions in a learning video is presented in this dissertation. The videos are indexed with keywords obtained from slides and a correspondence is established by segmenting the video temporally using feature descriptors to match and align the video segments with the presentation slides converted into images. The classroom recordings using regular video cameras often have poor illumination with objects partially or totally occluded. For such videos, slide localization techniques based on segmentation and heuristics is presented to improve the accuracy of the transition detection. A region prioritized ranking mechanism is proposed that integrates the keyword location in the presentation into the ranking of the slides when searching for a slide that covers a given keyword. This helps in getting the most relevant results first. With the increasing size of course materials gathered online, a user looking to understand a given concept can get overwhelmed. The standard way of learning and the concept of “one size fits all” is no longer the best way to learn for millennials. Personalized concept recommendation is presented according to the user’s background knowledge. Finally, the contributions of this dissertation have been integrated into the Ultimate Course Search (UCS), a tool for an effective search of course materials. UCS integrates presentation, lecture videos and textbook content into a single platform with topic based search capabilities and easy navigation of lecture materials

Digital Commons @ New Jersey Institute of Technology (NJIT)

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Recommended from our members

Roadmap for Music Information ReSearch

Author: Benetos E.
Chudy M.
Dixon S.
Flexer A.
Gomez E.
Gouyon F.
Herrera P.
Jorda S.
Magas M.
Paytuvi O.
Peeters G.
Schlüter J.
Serra X.
Vinet H.
Widmer G.
Publication venue: MIRES Consortium
Publication date: 01/01/2013
Field of study

City Research Online

UPF Digital Repository

Digital tools in media studies: analysis and research. An overview

Author
Publication venue: 'Transcript Verlag'
Publication date: 01/01/2015
Field of study

Digital tools are increasingly used in media studies, opening up new perspectives for research and analysis, while creating new problems at the same time. In this volume, international media scholars and computer scientists present their projects, varying from powerful film-historical databases to automatic video analysis software, discussing their application of digital tools and reporting on their results. This book is the first publication of its kind and a helpful guide to both media scholars and computer scientists who intend to use digital tools in their research, providing information on applications, standards, and problems

SSOAR - Social Science Open Access Repository

Digital Tools in Media Studies

Author
Publication venue: 'Transcript Verlag'
Publication date: 10/02/2021
Field of study

Directory of Open Access Books (DOAB)

Recommended from our members

Correlating Visual Speaker Gestures with Measures of Audience Engagement to Aid Video Browsing

Author: Zhang John
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

In this thesis, we argue that in the domains of educational lectures and political debates, speaker gestures can be a source of semantic cues for video browsing. We hypothesize that certain human gestures, which can be automatically identified through techniques of computer vision, can convey significant information that are correlated to audience engagement. We present a joint-angle descriptor derived from an automatic upper body pose estimation framework to train an SVM which identifies point and spread poses in extracted video frames of an instructor giving a lecture. Ground-truth is collected in the form of 2500 manually annotated frames covering 20 minutes of a video lecture. Cross validation on the ground-truth data showed classifier F-scores of 0.54 and 0.39 for point and spread poses, respectively. We also derive an attribute for gestures which measures the angular variance of the arm movements from this system (analogous to arm waving). We present a method for tracking hands which succeeds even when left and right hands are clasping and occluding each other. We evaluate on a ground-truth dataset of 698 images with 1301 annotated left and right hands, mostly clasped. Our method performs better than baseline on recall (0.66 vs. 0.53) without sacrificing precision (0.65 for both) toward the goal of recognizing clasped hands. For tracking, it results in an improvement over a baseline method with an F-score of 0.59 vs. 0.48. From this, we are able to derive hand motion-based gesture attributes such as velocity, direction change and extremal pose. In ground-truth studies, we manually annotate and analyze the gestures of two instructors, each in a 75-minute computer science lecture using a 14-bit pose vector. We observe "pedagogical" gestures of punctuation and encouragement in addition to traditional classes of gestures such as deictic and metaphoric. We also introduce a tool to facilitate the manual annotations of gestures in video and present results on their frequencies and co-occurrences. In particular, we find that 5 poses represent 80% of the variation in the annotated ground truth. We demonstrate a correlation between the angular variance of arm movements and the presence of those conjunctions that are used to contrast connected clauses ("but", "neither", etc.) in the accompanying speech. We do this by training an AdaBoost-based binary classifier using decision trees as weak learners. On a ground-truth database of 4243 video clips totaling 3.83 hours, each with subtitles, training on sets of conjunctions indicating contrast produces classifiers capable of achieving 55% accuracy on a balanced test set. We study two different presentation methods: an attribute graph which shows a normalized measure of the visual attributes across an entire video, as well as emphasized subtitles, where individual words are emphasized (resized) based on their accompanying gestures. Results from 12 subjects show supportive ratings given for the browsing aids in the task of providing keywords for video under time constraints. Subjects' keywords are also compared to independent ground-truth, resulting in precisions from 0.50-0.55, even when given less than half real time to view the video. We demonstrate a correlation between gesture attributes and a rigorous method of measuring audience engagement: electroencephalography (EEG). Our 20 subjects watch 61 minutes of video of the 2012 U.S. Presidential Debates while under observation through EEG. After discarding corrupted recordings, we retain 47 minutes worth of EEG data for each subject. The subjects are examined in aggregate and in subgroups according to gender and political affiliation. We find statistically significant correlations between gesture attributes (particularly extremal pose) and our feature of engagement derived from EEG. For all subjects watching all videos, we see a statistically significant correlation between gesture and engagement with a Spearman rank correlation of rho = 0.098 with p < 0.05, Bonferroni corrected. For some stratifications, correlations reach as high as rho = 0.297. From these results, we conclude what gestures can be used to measure engagement

Columbia University Academic Commons

Context-based multimedia semantics modelling and representation

Author: Eze Emmanuel Uchechukwu
Publication venue
Publication date: 01/05/2013
Field of study

The evolution of the World Wide Web, increase in processing power, and more network bandwidth have contributed to the proliferation of digital multimedia data. Since multimedia data has become a critical resource in many organisations, there is an increasing need to gain efficient access to data, in order to share, extract knowledge, and ultimately use the knowledge to inform business decisions. Existing methods for multimedia semantic understanding are limited to the computable low-level features; which raises the question of how to identify and represent the high-level semantic knowledge in multimedia resources.In order to bridge the semantic gap between multimedia low-level features and high-level human perception, this thesis seeks to identify the possible contextual dimensions in multimedia resources to help in semantic understanding and organisation. This thesis investigates the use of contextual knowledge to organise and represent the semantics of multimedia data aimed at efficient and effective multimedia content-based semantic retrieval.A mixed methods research approach incorporating both Design Science Research and Formal Methods for investigation and evaluation was adopted. A critical review of current approaches for multimedia semantic retrieval was undertaken and various shortcomings identified. The objectives for a solution were defined which led to the design, development, and formalisation of a context-based model for multimedia semantic understanding and organisation. The model relies on the identification of different contextual dimensions in multimedia resources to aggregate meaning and facilitate semantic representation, knowledge sharing and reuse. A prototype system for multimedia annotation, CONMAN was built to demonstrate aspects of the model and validate the research hypothesis, H₁.Towards providing richer and clearer semantic representation of multimedia content, the original contributions of this thesis to Information Science include: (a) a novel framework and formalised model for organising and representing the semantics of heterogeneous visual data; and (b) a novel S-Space model that is aimed at visual information semantic organisation and discovery, and forms the foundations for automatic video semantic understanding

Repository@Hull - Worktribe