Search CORE

62,901 research outputs found

Dublin City University at the TRECVid 2008 BBC rushes summarisation task

Author: Bredin Hervé
Byrne Daragh
Jones Gareth J.F.
Lee Hyowon
O'Connor Noel E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

We describe the video summarisation systems submitted by Dublin City University to the TRECVid 2008 BBC Rushes Summarisation task. We introduce a new approach to re- dundant video summarisation based on principal component analysis and linear discriminant analysis. The resulting low dimensional representation of each shot offers a simple way to compare and select representative shots of the original video. The final summary is constructed as a dynamic sto- ryboard. Both types of summaries were evaluated and the results are discussed

Crossref

Irish Universities

DCU Online Research Access Service

Video summarisation: A conceptual framework and survey of the state of the art

Author: Arthur G. Money
Babaguchi
Boyatzis
Cernekova
Chang
Chang
Crockford
Dey
Dimitrova
Ekin
Ferman
Gianluigi
Hanjalic
Hanjalic
Harry Agius
Joffe
Kim
Lee
Lew
Li
Li
Lienhart
Ma
Moriyama
Ngo
Otsuka
Shih
Silverman
Taylor
Tjondronegoro
Tseng
Wang
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/02/2008
Field of study

This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users

Crossref

Brunel University Research Archive

Scene extraction in motion pictures

Author: Dorai Chitra
Truong Ba Tu
Venkatesh Svetha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method

Deakin Research Online

Using film cutting in interface design

Author: Barnard P.
Dean M.
May J.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2003
Field of study

It has been suggested that computer interfaces could be made more usable if their designers utilized cinematography techniques, which have evolved to guide the viewer through a narrative despite frequent discontinuities in the presented scene (i.e., cuts between shots). Because of differences between the domains of film and interface design, it is not straightforward to understand how such techniques can be transferred. May and Barnard (1995) argued that a psychological model of watching film could support such a transference. This article presents an extended account of this model, which allows identification of the practice of collocation of objects of interest in the same screen position before and after a cut. To verify that filmmakers do, in fact, use such techniques successfully, eye movements were measured while participants watched the entirety of a commerciall

White Rose Research Online

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Author: Barbu Andrei
Berzak Yevgeni
Harari Daniel
Katz Boris
Ullman Shimon
Publication venue
Publication date: 04/12/2015
Field of study

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.Comment: EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT