Skip to main content
Article thumbnail
Location of Repository

Determining Computable Scenes in Films and their Structures Using Audio-Visual Memory Models

By Hari Sundaram and Shih-fu Chang


In this paper we present novel algorithms for computing scenes and within-scene structures in films. We begin by mapping insights from film-making rules and experimental results from the psychology of audition into a computational scene model. We define a computable scene to be a chunk of audio-visual data that exhibits long-term consistency with regard to three properties: (a) chromaticity (b) lighting (c) ambient sound. Central to the computational model is the notion of a causal, finite-memory model. We segment the audio and video data separately. In each case we determine the degree of correlation of the most recent data in the memory with the past. The respective scene boundaries are determined using local minima and aligned using a nearest neighbor algorithm. We introduce the idea of a discrete object series to automatically determine the structure within a scene. We then use statistical tests on the series to determine the presence of dialogue. The algorithms were tested on a difficult data set: five commercial films. We take the first hour of data from each of the five films. The best results: scene detection: 88% recall and 72% precision, dialogue detection: 91% recall and 100% precision

Topics: scene detection, shot-level structure, films
Year: 2000
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.