13,870 research outputs found

    Dialogue scene detection in movies using low and mid-level visual features

    Get PDF
    This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shot-level temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date

    Cloud Chaser: Real Time Deep Learning Computer Vision on Low Computing Power Devices

    Full text link
    Internet of Things(IoT) devices, mobile phones, and robotic systems are often denied the power of deep learning algorithms due to their limited computing power. However, to provide time-critical services such as emergency response, home assistance, surveillance, etc, these devices often need real-time analysis of their camera data. This paper strives to offer a viable approach to integrate high-performance deep learning-based computer vision algorithms with low-resource and low-power devices by leveraging the computing power of the cloud. By offloading the computation work to the cloud, no dedicated hardware is needed to enable deep neural networks on existing low computing power devices. A Raspberry Pi based robot, Cloud Chaser, is built to demonstrate the power of using cloud computing to perform real-time vision tasks. Furthermore, to reduce latency and improve real-time performance, compression algorithms are proposed and evaluated for streaming real-time video frames to the cloud.Comment: Accepted to The 11th International Conference on Machine Vision (ICMV 2018). Project site: https://zhengyiluo.github.io/projects/cloudchaser

    Face detection and clustering for video indexing applications

    Get PDF
    This paper describes a method for automatically detecting human faces in generic video sequences. We employ an iterative algorithm in order to give a confidence measure for the presence or absence of faces within video shots. Skin colour filtering is carried out on a selected number of frames per video shot, followed by the application of shape and size heuristics. Finally, the remaining candidate regions are normalized and projected into an eigenspace, the reconstruction error being the measure of confidence for presence/absence of face. Following this, the confidence score for the entire video shot is calculated. In order to cluster extracted faces into a set of face classes, we employ an incremental procedure using a PCA-based dissimilarity measure in con-junction with spatio-temporal correlation. Experiments were carried out on a representative broadcast news test corpus

    Seeing, Sensing, and Scrutinizing

    Get PDF
    Large changes in a scene often become difficult to notice if made during an eye movement, image flicker, movie cut, or other such disturbance. It is argued here that this <i>change blindness</i> can serve as a useful tool to explore various aspects of vision. This argument centers around the proposal that focused attention is needed for the explicit perception of change. Given this, the study of change perception can provide a useful way to determine the nature of visual attention, and to cast new light on the way that it is—and is not—involved in visual perception. To illustrate the power of this approach, this paper surveys its use in exploring three different aspects of vision. The first concerns the general nature of <i>seeing</i>. To explain why change blindness can be easily induced in experiments but apparently not in everyday life, it is proposed that perception involves a <i>virtual representation</i>, where object representations do not accumulate, but are formed as needed. An architecture containing both attentional and nonattentional streams is proposed as a way to implement this scheme. The second aspect concerns the ability of observers to detect change even when they have no visual experience of it. This <i>sensing</i> is found to take on at least two forms: detection without visual experience (but still with conscious awareness), and detection without any awareness at all. It is proposed that these are both due to the operation of a nonattentional visual stream. The final aspect considered is the nature of visual attention itself—the mechanisms involved when <i>scrutinizing</i> items. Experiments using controlled stimuli show the existence of various limits on visual search for change. It is shown that these limits provide a powerful means to map out the attentional mechanisms involved

    Indexing of fictional video content for event detection and summarisation

    Get PDF
    This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach

    Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model

    Get PDF
    • 

    corecore