5,638 research outputs found
Shot boundary detection in MPEG videos using local and global indicators
Shot boundary detection (SBD) plays important roles in many video applications. In this letter, we describe a novel method on SBD operating directly in the compressed domain. First, several local indicators are extracted from MPEG macroblocks, and AdaBoost is employed for feature selection and fusion. The selected features are then used in classifying candidate cuts into five sub-spaces via pre-filtering and rule-based decision making. Following that, global indicators of frame similarity between boundary frames of cut candidates are examined using phase correlation of dc images. Gradual transitions like fade, dissolve, and combined shot cuts are also identified. Experimental results on the test data from TRECVID'07 have demonstrated the effectiveness and robustness of our proposed methodology. * INSPEC o Controlled Indexing decision making , image segmentation , knowledge based systems , video coding o Non Controlled Indexing AdaBoost , MPEG videos , feature selection , global indicator , local indicator , rule-based decision making , shot boundary detection , video segmentation * Author Keywords Decision making , TRECVID , shot boundary detection (SBD) , video segmentation , video signal processing References 1. J. Yuan , H. Wang , L. Xiao , W. Zheng , J. L. F. Lin and B. Zhang "A formal study of shot boundary detection", IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 168 2007. Abstract |Full Text: PDF (2789KB) 2. C. Grana and R. Cucchiara "Linear transition detection as a unified shot detection approach", IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 483 2007. Abstract |Full Text: PDF (505KB) 3. Q. Urhan , M. K. Gullu and S. Erturk "Modified phase-correlation based robust hard-cut detection with application to archive film", IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 753 2006. Abstract |Full Text: PDF (3808KB) 4. C. Cotsaces , N. Nikolaidis and I. Pitas "Video shot detection and condensed representation: A review", Proc. IEEE Signal Mag., vol. 23, pp. 28 2006. 5. National Institute of Standards and Technology (NIST), pp. [online] Available: http://www-nlpir.nist.gov/projects/trecvid/ 6. J. Bescos "Real-time shot change detection over online MPEG-2 video", IEEE Trans. Circuits Syst. Video Technol., vol. 14, pp. 475 2004. Abstract |Full Text: PDF (1056KB) 7. H. Lu and Y. P. Tan "An effective post-refinement method for shot boundary detection", IEEE Trans. Circuits Syst. Video Technol., vol. 15, pp. 1407 2005. Abstract |Full Text: PDF (3128KB) 8. G. Boccignone , A. Chianese , V. Moscato and A. Picariello "Foveated shot detection for video segmentation", IEEE Trans. Circuits Syst. Video Technol., vol. 15, pp. 365 2005. Abstract |Full Text: PDF (2152KB) 9. Z. Cernekova , I. Pitas and C. Nikou "Information theory-based shot cut/fade detection and video summarization", IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 82 2006. Abstract |Full Text: PDF (1184KB) 10. L.-Y. Duan , M. Xu , Q. Tian , C.-S. Xu and J. S. Jin "A unified framework for semantic shot classification in sports video", IEEE Trans. Multimedia, vol. 7, pp. 1066 2005. Abstract |Full Text: PDF (2872KB) 11. H. Fang , J. M. Jiang and Y. Feng "A fuzzy logic approach for detection of video shot boundaries", Pattern Recogn., vol. 39, pp. 2092 2006. [CrossRef] 12. R. A. Joyce and B. Liu "Temporal segmentation of video using frame and histogram space", IEEE Trans. Multimedia, vol. 8, pp. 130 2006. Abstract |Full Text: PDF (864KB) 13. A. Hanjalic "Shot boundary detection: Unraveled and resolved", IEEE Trans. Circuits Syst. Video Technol., vol. 12, pp. 90 2002. Abstract |Full Text: PDF (289KB) 14. S.-C. Pei and Y.-Z. Chou "Efficient MPEG compressed video analysis using macroblock type information", IEEE Trans. Multimedia, vol. 1, pp. 321 1999. Abstract |Full Text: PDF (612KB) 15. C.-L. Huang and B.-Y. Liao "A robust scene-change detection method for video segmentation", IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 1281 2001. Abstract |Full Text: PDF (241KB) 16. Y. Freund and R. E. Schapire "A decision-theoretic generalization of online learning and an application to boosting", J. Comput. Syst. Sci., vol. 55, pp. 119 1997. [CrossRef] On this page * Abstract * Index Terms * References Brought to you by STRATHCLYDE UNIVERSITY LIBRARY * Your institute subscribes to: * IEEE-Wiley eBooks Library , IEEE/IET Electronic Library (IEL) * What can I access? Terms of Us
Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos
In this paper, a content-aware approach is proposed to design multiple test conditions for shot cut detection, which are organized into a multiple phase decision tree for abrupt cut detection and a finite state machine for dissolve detection. In comparison with existing approaches, our algorithm is characterized with two categories of content difference indicators and testing. While the first category indicates the content changes that are directly used for shot cut detection, the second category indicates the contexts under which the content change occurs. As a result, indications of frame differences are tested with context awareness to make the detection of shot cuts adaptive to both content and context changes. Evaluations announced by TRECVID 2007 indicate that our proposed algorithm achieved comparable performance to those using machine learning approaches, yet using a simpler feature set and straightforward design strategies. This has validated the effectiveness of modelling of content-aware indicators for decision making, which also provides a good alternative to conventional approaches in this topic
Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation
The vast majority of people who suffer unexpected cardiac arrest are
performed cardiopulmonary resuscitation (CPR) by passersby in a desperate
attempt to restore life, but endeavors turn out to be fruitless on account of
disqualification. Fortunately, many pieces of research manifest that
disciplined training will help to elevate the success rate of resuscitation,
which constantly desires a seamless combination of novel techniques to yield
further advancement. To this end, we collect a custom CPR video dataset in
which trainees make efforts to behave resuscitation on mannequins independently
in adherence to approved guidelines, thereby devising an auxiliary toolbox to
assist supervision and rectification of intermediate potential issues via
modern deep learning methodologies. Our research empirically views this problem
as a temporal action segmentation (TAS) task in computer vision, which aims to
segment an untrimmed video at a frame-wise level. Here, we propose a
Prompt-enhanced hierarchical Transformer (PhiTrans) that integrates three
indispensable modules, including a textual prompt-based Video Features
Extractor (VFE), a transformer-based Action Segmentation Executor (ASE), and a
regression-based Prediction Refinement Calibrator (PRC). The backbone of the
model preferentially derives from applications in three approved public
datasets (GTEA, 50Salads, and Breakfast) collected for TAS tasks, which
accounts for the excavation of the segmentation pipeline on the CPR dataset. In
general, we unprecedentedly probe into a feasible pipeline that genuinely
elevates the CPR instruction qualification via action segmentation in
conjunction with cutting-edge deep learning techniques. Associated experiments
advocate our implementation with multiple metrics surpassing 91.0%.Comment: Transformer for Cardiopulmonary Resuscitatio
Symmetric Dilated Convolution for Surgical Gesture Recognition
Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame dependencies, which largely outperform the state-of-the-art methods on the frame-wise accuracy up to ∼ 6 points and the F1@50 score ∼ 6 points
Movie Editing and Cognitive Event Segmentation in Virtual Reality Video
Traditional cinematography has relied for over a century on a
well-established set of editing rules, called continuity editing, to create a
sense of situational continuity. Despite massive changes in visual content
across cuts, viewers in general experience no trouble perceiving the
discontinuous flow of information as a coherent set of events. However, Virtual
Reality (VR) movies are intrinsically different from traditional movies in that
the viewer controls the camera orientation at all times. As a consequence,
common editing techniques that rely on camera orientations, zooms, etc., cannot
be used. In this paper we investigate key relevant questions to understand how
well traditional movie editing carries over to VR. To do so, we rely on recent
cognition studies and the event segmentation theory, which states that our
brains segment continuous actions into a series of discrete, meaningful events.
We first replicate one of these studies to assess whether the predictions of
such theory can be applied to VR. We next gather gaze data from viewers
watching VR videos containing different edits with varying parameters, and
provide the first systematic analysis of viewers' behavior and the perception
of continuity in VR. From this analysis we make a series of relevant findings;
for instance, our data suggests that predictions from the cognitive event
segmentation theory are useful guides for VR editing; that different types of
edits are equally well understood in terms of continuity; and that spatial
misalignments between regions of interest at the edit boundaries favor a more
exploratory behavior even after viewers have fixated on a new region of
interest. In addition, we propose a number of metrics to describe viewers'
attentional behavior in VR. We believe the insights derived from our work can
be useful as guidelines for VR content creation
PredNet and Predictive Coding: A Critical Review
PredNet, a deep predictive coding network developed by Lotter et al.,
combines a biologically inspired architecture based on the propagation of
prediction error with self-supervised representation learning in video. While
the architecture has drawn a lot of attention and various extensions of the
model exist, there is a lack of a critical analysis. We fill in the gap by
evaluating PredNet both as an implementation of the predictive coding theory
and as a self-supervised video prediction model using a challenging video
action classification dataset. We design an extended model to test if
conditioning future frame predictions on the action class of the video improves
the model performance. We show that PredNet does not yet completely follow the
principles of predictive coding. The proposed top-down conditioning leads to a
performance gain on synthetic data, but does not scale up to the more complex
real-world action classification dataset. Our analysis is aimed at guiding
future research on similar architectures based on the predictive coding theory
- …