Search CORE

32,453 research outputs found

Dialogue scene detection in movies using low and mid-level visual features

Author: Lehane Bart
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/10/2004
Field of study

This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shot-level temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date

Irish Universities

DCU Online Research Access Service

Multimedia information technology and the annotation of video

Author: Jong F.M.G. de
Smeulders A.
Worring M.
Publication venue: Stichting Archiefpublicaties
Publication date: 01/01/2006
Field of study

The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

University of Twente Research Information

Recommended from our members

MUSCLE movie-database: a multimodal corpus with rich annotation for dialogue and saliency detection

Author: Antonopoulos P.
Benetos E.
Kotropoulos C.
Kotti M.
Maragos P.
Moschou V.
Nikolaidis N.
Pitas I.
Spachos D.
Tzimouli K.
Zlantintsi A.
Publication venue
Publication date: 01/01/2008
Field of study

City Research Online

Spiral - Imperial College Digital Repository

Audio-visual foreground extraction for event characterization

Author: Bicego Manuele
Cristani Marco
Murino Vittorio
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage, and coupled with an audio BG/FG modelling scheme. The audiovisual association is performed on-line, by exploiting the concept of synchrony. Experimental tests carrying out classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by using the single modalities

Catalogo dei prodotti della ricerca

UnissResearch

Speaker-following Video Subtitles

Author: Hu Yongtao
Kautz Jan
Wang Wenping
Yu Yizhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain

arXiv.org e-Print Archive

HKU Scholars Hub

A framework for dialogue detection in movies

Author: Kotropoulos C
Kotti M
Moschou V
Pitas I
Ziòlko B
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

In this paper, we investigate a novel framework for dialogue detection that is based on indicator functions. An indicator function defines that a particular actor is present at each time instant. Two dialogue detection rules are developed and assessed. The first rule relies on the value of the cross-correlation function at zero time lag that is compared to a threshold. The second rule is based on the cross-power in a particular frequency band that is also compared to a threshold. Experiments are carried out in order to validate the feasibility of the aforementioned dialogue detection rules by using ground-truth indicator functions determined by human observers from six different movies. A total of 25 dialogue scenes and another 8 non-dialogue scenes are employed. The probabilities of false alarm and detection are estimated by cross-validation, where 70% of the available scenes are used to learn the thresholds employed in the dialogue detection rules and the remaining 30% of the scenes are used for testing. An almost perfect dialogue detection is reported for every distinct threshold. © Springer-Verlag Berlin Heidelberg 2006

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

A neural network approach to audio-assisted movie dialogue detection

Author: Alatan
Birge
Constantine Kotropoulos
Emmanouil Benetos
Freund
Freund
Hosmer
Ioannis Pitas
Jelinek
Kotti
Král
Lehane
Margarita Kotti
Papoulis
Platt
Reiss
Stoica
Trelea
Webb
Zhai
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%

CiteSeerX

City Research Online

Crossref

Spiral - Imperial College Digital Repository

Indexing of fictional video content for event detection and summarisation

Author: Lee Hyowon
Lehane Bart
O'Connor Noel E.
Smeaton Alan F.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2007
Field of study

This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Irish Universities

DCU Online Research Access Service