11 research outputs found
Augmenting conversations through context-aware multimedia retrieval based on speech recognition
Future’s environments will be sensitive and responsive to the presence of people to support them carrying out their everyday life activities, tasks and rituals, in an easy and natural way. Such interactive spaces will use the information and communication technologies to bring the computation into the physical world, in order to enhance ordinary activities of their users. This paper describes a speech-based spoken multimedia retrieval system that can be used to present relevant video-podcast (vodcast) footage, in response to spontaneous speech and conversations during daily life activities. The proposed system allows users to search the spoken content of multimedia files rather than their associated meta-information and let them navigate to the right portion where queried words are spoken by facilitating within-medium searches of multimedia content through a bag-of-words approach. Finally, we have studied the proposed system on different scenarios by using vodcasts in English from various categories, as the targeted multimedia, and discussed how it would enhance people’s everyday life activities by different scenarios including education, entertainment, marketing, news and workplace
VSCAN: An Enhanced Video Summarization using Density-based Spatial Clustering
In this paper, we present VSCAN, a novel approach for generating static video
summaries. This approach is based on a modified DBSCAN clustering algorithm to
summarize the video content utilizing both color and texture features of the
video frames. The paper also introduces an enhanced evaluation method that
depends on color and texture features. Video Summaries generated by VSCAN are
compared with summaries generated by other approaches found in the literature
and those created by users. Experimental results indicate that the video
summaries generated by VSCAN have a higher quality than those generated by
other approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1401.3590 by
other authors without attributio
User-based key frame detection in social web video
Video search results and suggested videos on web sites are represented with a
video thumbnail, which is manually selected by the video up-loader among three
randomly generated ones (e.g., YouTube). In contrast, we present a grounded
user-based approach for automatically detecting interesting key-frames within a
video through aggregated users' replay interactions with the video player.
Previous research has focused on content-based systems that have the benefit of
analyzing a video without user interactions, but they are monolithic, because
the resulting video thumbnails are the same regardless of the user preferences.
We constructed a user interest function, which is based on aggregate video
replays, and analyzed hundreds of user interactions. We found that the local
maximum of the replaying activity stands for the semantics of information rich
videos, such as lecture, and how-to. The concept of user-based key-frame
detection could be applied to any video on the web, in order to generate a
user-based and dynamic video thumbnail in search results.Comment: 4 pages, 4 figure
A Hierarchical Keyframe User Interface for Browsing Video over the Internet
We present an interactive content-based video browser allowing fast, non linear and hierarchical navigation of video over the Internet through multiple levels of key-frames that provide a visual summary of video content. Our method is based on an XML framework, dynamically generated parameterized XSL style sheets, and SMIL. The architecture is designed to incorporate additional recognized features (e.g. from audio) in future versions. The last part of this paper describes a user study which indicates that this browsing interface is more comfortable to use and approximately three times faster for locating remembered still images within videos compared to the simple VCR controls built into RealPlayer
In Search of a Good BET
This document introduces an evaluation test for multi-media meeting browsers, that eliminates much of the subjectivity in typical multi-media browser tests
Effective video summarization approach based on visual attention
Video summarization is applied to reduce redundancy and develop a concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes, the frames that stand out visually are extracted as key frames based on human attention modeling theories. The schemes for modeling visual attention have proven to be effective for video summaries. Nevertheless, the high cost of computing in such techniques restricts their usability in everyday situations. In this context, we propose a method based on KFE (key frame extraction) technique, which is recommended based on an efficient and accurate visual attention model. The calculation effort is minimized by utilizing dynamic visual highlighting based on the temporal gradient instead of the traditional optical flow techniques. In addition, an efficient technique using a discrete cosine transformation is utilized for the static visual salience. The dynamic and static visual attention metrics are merged by means of a non-linear weighted fusion technique. Results of the systemare compared with some existing stateof- the-art techniques for the betterment of accuracy. The experimental results of our proposed model indicate the efficiency and high standard in terms of the key frames extraction as output.Qatar University - No. IRCC-2021-010