Search CORE

39,483 research outputs found

A framework for creating natural language descriptions of video streams

Author: Baiget
Bolle
Everingham
Kim
Kuchi
Maglogiannis
Metze
Muhammad Usman Ghani Khan
Nouf Al Harbi
Reiter
Ryoo
Salton
Schirra
Yao
Yoshihiko Gotoh
Publication venue: 'Elsevier BV'
Publication date: 01/05/2015
Field of study

This contribution addresses generation of natural language descriptions for important visual content present in video streams. The work starts with implementation of conventional image processing techniques to extract high-level visual features such as humans and their activities. These features are converted into natural language descriptions using a template-based approach built on a context free grammar, incorporating spatial and temporal information. The task is challenging particularly because feature extraction processes are erroneous at various levels. In this paper we explore approaches to accommodating potentially missing information, thus creating a coherent description. Sample automatic annotations are created for video clips presenting humans’ close-ups and actions, and qualitative analysis of the approach is made from various aspects. Additionally a task-based scheme is introduced that provides quantitative evaluation for relevance of generated descriptions. Further, to show the framework’s potential for extension, a scalability study is conducted using video categories that are not targeted during the development

Crossref

White Rose Research Online

Generating natural language tags for video information management

Author: BZ Yao
J Pustejovsky
JF Allen
MP Marcus
MUG Khan
Muhammad Usman Ghani Khan
P Baiget
RR Vallacher
W Kim
WC Hu
Y Yang
Yoshihiko Gotoh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/02/2017
Field of study

This exploratory work is concerned with generation of natural language descriptions that can be used for video retrieval applications. It is a step ahead of keyword-based tagging as it captures relations between keywords associated with videos. Firstly, we prepare hand annotations consisting of descriptions for video segments crafted from a TREC Video dataset. Analysis of this data presents insights into human’s interests on video contents. Secondly, we develop a framework for creating smooth and coherent description of video streams. It builds on conventional image processing techniques that extract high-level features from individual video frames. Natural language description is then produced based on high-level features. Although feature extraction processes are erroneous at various levels, we explore approaches to putting them together to produce a coherent, smooth and well-phrased description by incorporating spatial and temporal information. Evaluation is made by calculating ROUGE scores between human-annotated and machine-generated descriptions. Further, we introduce a task-based evaluation by human subjects which provides qualitative evaluation of generated descriptions

Crossref

White Rose Research Online

An MPEG-7 scheme for semantic content modelling and filtering of digital video

Author: A. Vakali
A. Vetro
B.L. Tseng
B.L. Tseng
C. Okoli
C.S. Goldfarb
F. Golshani
F. Kretz
G. Rowe
H. Kosch
H.W. Agius
H.W. Agius
H.W. Agius
Harry Agius
J. Hunter
J. Magalhães
J.F. Allen
L. Al-Safadi
L. Wenyin
M. Davis
M. Echiffre
M. Eirinaki
M.C. Angelides
M.R. Naphande
Marios C. Angelides
N. Adami
P. Correia
P. Salembier
P.M. Fonseca
R. Zhao
S. Adali
S.R. Newcomb
S.R. Newcomb
S.W. Ambler
T. Meyer-Boudnik
U. Westermann
Y.F. Day
É Germain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2006
Field of study

Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users

Crossref

Brunel University Research Archive

Movie Description

Author: Courville Aaron
Larochelle Hugo
Pal Christopher
Rohrbach Anna
Rohrbach Marcus
Schiele Bernt
Tandon Niket
Torabi Atousa
Publication venue
Publication date: 12/05/2016
Field of study

Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

Springer - Publisher Connector

PolyPublie

MPG.PuRe

Indexing, browsing and searching of digital video

Author: Abe
Avaro
Brown
Chang
Chang
Choi
Goodrum
Hauptmann
Hirschman
Jarina
Kavanagh
Kazman
Koegel Buford
Kravtchenko
Le Gall
Lee
Lienhart
Marchionini
Maybury
McTear
Myers
Myllymaki
Poynton
Puri
Rasmussen
Rorvig
Rowley
Smyth
Sparck Jones
Stein
Wactlar
Wallace
Witbrock
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Everything You Wanted to Know About MPEG-7: Part 1

Author: Lindsay A.T.
Nack F.-M. (Frank)
Publication venue: I.E.E.E. Computer Society Press
Publication date: 01/01/1999
Field of study

Part I of this article provides an overview of the development, functionality, and applicability of MPEG-7. We ll first present the role of MPEG-7 within the context of past MPEG standards. We then outline ideas of what should be possible using MPEG-7 technology. In Part II, we ll discuss the description of MPEG-7 s concepts, terminology, and requirements. We ll then compare MPEG-7 to other approaches on multimedia content description

CWI's Institutional Repository

Geoscience after IT: Part J. Human requirements that shape the evolving geoscience information system

Author: Baker
Coad
Gilbert
Goodchild
Kent
Kuhn
Laszlo
Leatherdale
MacEachren
McCrone
Popper
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

The geoscience record is constrained by the limitations of human thought and of the technology for handling information. IT can lead us away from the tyranny of older technology, but to find the right path, we need to understand our own limitations. Language, images, data and mathematical models, are tools for expressing and recording our ideas. Backed by intuition, they enable us to think in various modes, to build knowledge from information and create models as artificial views of a real world. Markup languages may accommodate more flexible and better connected records, and the object-oriented approach may help to match IT more closely to our thought processes

Crossref

NERC Open Research Archive