5 research outputs found

    Dialogue scene detection in movies using low and mid-level visual features

    Get PDF
    This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shot-level temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date

    Condensing Computable Scenes Using Visual Complexity And Film Syntax Analysis

    No full text
    In this paper, we present a novel algorithm to condense computable scenes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. We attempt to condense such scenes in two ways. First, we define visual complexity of a shot to be its Kolmogorov complexity. Then, we conduct experiments that help us map the complexity of a shot into the minimum time required for its comprehension. Second, we analyze the grammar of the film language, since it makes the shot sequence meaningful. These grammatical rules are used to condense scenes, in parallel to the shot level condensation. We've implemented a system that generates a skim given a time budget. Our user studies show good results on skims with compression rates between 60-80%

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    ComputergestĂŒtzte Inhaltsanalyse von digitalen Videoarchiven

    Full text link
    Der Übergang von analogen zu digitalen Videos hat in den letzten Jahren zu großen VerĂ€nderungen innerhalb der Filmarchive gefĂŒhrt. Insbesondere durch die Digitalisierung der Filme ergeben sich neue Möglichkeiten fĂŒr die Archive. Eine Abnutzung oder Alterung der Filmrollen ist ausgeschlossen, so dass die QualitĂ€t unverĂ€ndert erhalten bleibt. Zudem wird ein netzbasierter und somit deutlich einfacherer Zugriff auf die Videos in den Archiven möglich. ZusĂ€tzliche Dienste stehen den Archivaren und Anwendern zur VerfĂŒgung, die erweiterte Suchmöglichkeiten bereitstellen und die Navigation bei der Wiedergabe erleichtern. Die Suche innerhalb der Videoarchive erfolgt mit Hilfe von Metadaten, die weitere Informationen ĂŒber die Videos zur VerfĂŒgung stellen. Ein großer Teil der Metadaten wird manuell von Archivaren eingegeben, was mit einem großen Zeitaufwand und hohen Kosten verbunden ist. Durch die computergestĂŒtzte Analyse eines digitalen Videos ist es möglich, den Aufwand bei der Erzeugung von Metadaten fĂŒr Videoarchive zu reduzieren. Im ersten Teil dieser Dissertation werden neue Verfahren vorgestellt, um wichtige semantische Inhalte der Videos zu erkennen. Insbesondere werden neu entwickelte Algorithmen zur Erkennung von Schnitten, der Analyse der Kamerabewegung, der Segmentierung und Klassifikation von Objekten, der Texterkennung und der Gesichtserkennung vorgestellt. Die automatisch ermittelten semantischen Informationen sind sehr wertvoll, da sie die Arbeit mit digitalen Videoarchiven erleichtern. Die Informationen unterstĂŒtzen nicht nur die Suche in den Archiven, sondern fĂŒhren auch zur Entwicklung neuer Anwendungen, die im zweiten Teil der Dissertation vorgestellt werden. Beispielsweise können computergenerierte Zusammenfassungen von Videos erzeugt oder Videos automatisch an die Eigenschaften eines AbspielgerĂ€tes angepasst werden. Ein weiterer Schwerpunkt dieser Dissertation liegt in der Analyse historischer Filme. Vier europĂ€ische Filmarchive haben eine große Anzahl historischer Videodokumentationen zur VerfĂŒgung gestellt, welche Anfang bis Mitte des letzten Jahrhunderts gedreht und in den letzten Jahren digitalisiert wurden. Durch die Lagerung und Abnutzung der Filmrollen ĂŒber mehrere Jahrzehnte sind viele Videos stark verrauscht und enthalten deutlich sichtbare Bildfehler. Die BildqualitĂ€t der historischen Schwarz-Weiß-Filme unterscheidet sich signifikant von der QualitĂ€t aktueller Videos, so dass eine verlĂ€ssliche Analyse mit bestehenden Verfahren hĂ€ufig nicht möglich ist. Im Rahmen dieser Dissertation werden neue Algorithmen vorgestellt, um eine zuverlĂ€ssige Erkennung von semantischen Inhalten auch in historischen Videos zu ermöglichen
    corecore