5,638 research outputs found

    Shot boundary detection in MPEG videos using local and global indicators

    Get PDF
    Shot boundary detection (SBD) plays important roles in many video applications. In this letter, we describe a novel method on SBD operating directly in the compressed domain. First, several local indicators are extracted from MPEG macroblocks, and AdaBoost is employed for feature selection and fusion. The selected features are then used in classifying candidate cuts into five sub-spaces via pre-filtering and rule-based decision making. Following that, global indicators of frame similarity between boundary frames of cut candidates are examined using phase correlation of dc images. Gradual transitions like fade, dissolve, and combined shot cuts are also identified. Experimental results on the test data from TRECVID'07 have demonstrated the effectiveness and robustness of our proposed methodology. * INSPEC o Controlled Indexing decision making , image segmentation , knowledge based systems , video coding o Non Controlled Indexing AdaBoost , MPEG videos , feature selection , global indicator , local indicator , rule-based decision making , shot boundary detection , video segmentation * Author Keywords Decision making , TRECVID , shot boundary detection (SBD) , video segmentation , video signal processing References 1. J. Yuan , H. Wang , L. Xiao , W. Zheng , J. L. F. Lin and B. Zhang "A formal study of shot boundary detection", IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 168 2007. Abstract |Full Text: PDF (2789KB) 2. C. Grana and R. Cucchiara "Linear transition detection as a unified shot detection approach", IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 483 2007. Abstract |Full Text: PDF (505KB) 3. Q. Urhan , M. K. Gullu and S. Erturk "Modified phase-correlation based robust hard-cut detection with application to archive film", IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 753 2006. Abstract |Full Text: PDF (3808KB) 4. C. Cotsaces , N. Nikolaidis and I. Pitas "Video shot detection and condensed representation: A review", Proc. IEEE Signal Mag., vol. 23, pp. 28 2006. 5. National Institute of Standards and Technology (NIST), pp. [online] Available: http://www-nlpir.nist.gov/projects/trecvid/ 6. J. Bescos "Real-time shot change detection over online MPEG-2 video", IEEE Trans. Circuits Syst. Video Technol., vol. 14, pp. 475 2004. Abstract |Full Text: PDF (1056KB) 7. H. Lu and Y. P. Tan "An effective post-refinement method for shot boundary detection", IEEE Trans. Circuits Syst. Video Technol., vol. 15, pp. 1407 2005. Abstract |Full Text: PDF (3128KB) 8. G. Boccignone , A. Chianese , V. Moscato and A. Picariello "Foveated shot detection for video segmentation", IEEE Trans. Circuits Syst. Video Technol., vol. 15, pp. 365 2005. Abstract |Full Text: PDF (2152KB) 9. Z. Cernekova , I. Pitas and C. Nikou "Information theory-based shot cut/fade detection and video summarization", IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 82 2006. Abstract |Full Text: PDF (1184KB) 10. L.-Y. Duan , M. Xu , Q. Tian , C.-S. Xu and J. S. Jin "A unified framework for semantic shot classification in sports video", IEEE Trans. Multimedia, vol. 7, pp. 1066 2005. Abstract |Full Text: PDF (2872KB) 11. H. Fang , J. M. Jiang and Y. Feng "A fuzzy logic approach for detection of video shot boundaries", Pattern Recogn., vol. 39, pp. 2092 2006. [CrossRef] 12. R. A. Joyce and B. Liu "Temporal segmentation of video using frame and histogram space", IEEE Trans. Multimedia, vol. 8, pp. 130 2006. Abstract |Full Text: PDF (864KB) 13. A. Hanjalic "Shot boundary detection: Unraveled and resolved", IEEE Trans. Circuits Syst. Video Technol., vol. 12, pp. 90 2002. Abstract |Full Text: PDF (289KB) 14. S.-C. Pei and Y.-Z. Chou "Efficient MPEG compressed video analysis using macroblock type information", IEEE Trans. Multimedia, vol. 1, pp. 321 1999. Abstract |Full Text: PDF (612KB) 15. C.-L. Huang and B.-Y. Liao "A robust scene-change detection method for video segmentation", IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 1281 2001. Abstract |Full Text: PDF (241KB) 16. Y. Freund and R. E. Schapire "A decision-theoretic generalization of online learning and an application to boosting", J. Comput. Syst. Sci., vol. 55, pp. 119 1997. [CrossRef] On this page * Abstract * Index Terms * References Brought to you by STRATHCLYDE UNIVERSITY LIBRARY * Your institute subscribes to: * IEEE-Wiley eBooks Library , IEEE/IET Electronic Library (IEL) * What can I access? Terms of Us

    Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos

    Get PDF
    In this paper, a content-aware approach is proposed to design multiple test conditions for shot cut detection, which are organized into a multiple phase decision tree for abrupt cut detection and a finite state machine for dissolve detection. In comparison with existing approaches, our algorithm is characterized with two categories of content difference indicators and testing. While the first category indicates the content changes that are directly used for shot cut detection, the second category indicates the contexts under which the content change occurs. As a result, indications of frame differences are tested with context awareness to make the detection of shot cuts adaptive to both content and context changes. Evaluations announced by TRECVID 2007 indicate that our proposed algorithm achieved comparable performance to those using machine learning approaches, yet using a simpler feature set and straightforward design strategies. This has validated the effectiveness of modelling of content-aware indicators for decision making, which also provides a good alternative to conventional approaches in this topic

    Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation

    Full text link
    The vast majority of people who suffer unexpected cardiac arrest are performed cardiopulmonary resuscitation (CPR) by passersby in a desperate attempt to restore life, but endeavors turn out to be fruitless on account of disqualification. Fortunately, many pieces of research manifest that disciplined training will help to elevate the success rate of resuscitation, which constantly desires a seamless combination of novel techniques to yield further advancement. To this end, we collect a custom CPR video dataset in which trainees make efforts to behave resuscitation on mannequins independently in adherence to approved guidelines, thereby devising an auxiliary toolbox to assist supervision and rectification of intermediate potential issues via modern deep learning methodologies. Our research empirically views this problem as a temporal action segmentation (TAS) task in computer vision, which aims to segment an untrimmed video at a frame-wise level. Here, we propose a Prompt-enhanced hierarchical Transformer (PhiTrans) that integrates three indispensable modules, including a textual prompt-based Video Features Extractor (VFE), a transformer-based Action Segmentation Executor (ASE), and a regression-based Prediction Refinement Calibrator (PRC). The backbone of the model preferentially derives from applications in three approved public datasets (GTEA, 50Salads, and Breakfast) collected for TAS tasks, which accounts for the excavation of the segmentation pipeline on the CPR dataset. In general, we unprecedentedly probe into a feasible pipeline that genuinely elevates the CPR instruction qualification via action segmentation in conjunction with cutting-edge deep learning techniques. Associated experiments advocate our implementation with multiple metrics surpassing 91.0%.Comment: Transformer for Cardiopulmonary Resuscitatio

    Symmetric Dilated Convolution for Surgical Gesture Recognition

    Get PDF
    Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame dependencies, which largely outperform the state-of-the-art methods on the frame-wise accuracy up to ∼ 6 points and the F1@50 score ∼ 6 points

    Movie Editing and Cognitive Event Segmentation in Virtual Reality Video

    Get PDF
    Traditional cinematography has relied for over a century on a well-established set of editing rules, called continuity editing, to create a sense of situational continuity. Despite massive changes in visual content across cuts, viewers in general experience no trouble perceiving the discontinuous flow of information as a coherent set of events. However, Virtual Reality (VR) movies are intrinsically different from traditional movies in that the viewer controls the camera orientation at all times. As a consequence, common editing techniques that rely on camera orientations, zooms, etc., cannot be used. In this paper we investigate key relevant questions to understand how well traditional movie editing carries over to VR. To do so, we rely on recent cognition studies and the event segmentation theory, which states that our brains segment continuous actions into a series of discrete, meaningful events. We first replicate one of these studies to assess whether the predictions of such theory can be applied to VR. We next gather gaze data from viewers watching VR videos containing different edits with varying parameters, and provide the first systematic analysis of viewers' behavior and the perception of continuity in VR. From this analysis we make a series of relevant findings; for instance, our data suggests that predictions from the cognitive event segmentation theory are useful guides for VR editing; that different types of edits are equally well understood in terms of continuity; and that spatial misalignments between regions of interest at the edit boundaries favor a more exploratory behavior even after viewers have fixated on a new region of interest. In addition, we propose a number of metrics to describe viewers' attentional behavior in VR. We believe the insights derived from our work can be useful as guidelines for VR content creation

    PredNet and Predictive Coding: A Critical Review

    Full text link
    PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspired architecture based on the propagation of prediction error with self-supervised representation learning in video. While the architecture has drawn a lot of attention and various extensions of the model exist, there is a lack of a critical analysis. We fill in the gap by evaluating PredNet both as an implementation of the predictive coding theory and as a self-supervised video prediction model using a challenging video action classification dataset. We design an extended model to test if conditioning future frame predictions on the action class of the video improves the model performance. We show that PredNet does not yet completely follow the principles of predictive coding. The proposed top-down conditioning leads to a performance gain on synthetic data, but does not scale up to the more complex real-world action classification dataset. Our analysis is aimed at guiding future research on similar architectures based on the predictive coding theory
    corecore