11,178 research outputs found
TREC video retrieval evaluation: a case study and status report
The TREC Video Retrieval Evaluation is a multiyear, international effort, funded by the US Advanced Research and Development Agency (ARDA) and the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. Now beginning its fourth year, it aims over time to develop both a better understanding of
how systems can effectively accomplish such retrieval
and how one can reliably benchmark their performance. This paper can be seen as a case study in the development of video retrieval systems and their evaluation as well as a report on their status to-date. After an introduction to the evolution of the evaluation over the past three years, the paper reports on the most recent evaluation TRECVID 2003: the evaluation framework — the 4 tasks (shot boundary determination, high-level feature extraction, story segmentation and typing, search), 133 hours of US television
news data, and measures —, the results, and the approaches taken by the 24 participating groups
A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION
VIDEO segmentation facilitates e±cient video indexing and navigation in large
digital video archives. It is an important process in a content-based video
indexing and retrieval (CBVIR) system. Many automated solutions performed seg-
mentation by utilizing information about the \facts" of the video. These \facts"
come in the form of labels that describe the objects which are captured by the cam-
era. This type of solutions was able to achieve good and consistent results for some
video genres such as news programs and informational presentations. The content
format of this type of videos is generally quite standard, and automated solutions
were designed to follow these format rules. For example in [1], the presence of news
anchor persons was used as a cue to determine the start and end of a meaningful
news segment.
The same cannot be said for video genres such as movies and feature films.
This is because makers of this type of videos utilized different filming techniques to
design their videos in order to elicit certain affective response from their targeted
audience. Humans usually perform manual video segmentation by trying to relate
changes in time and locale to discontinuities in meaning [2]. As a result, viewers
usually have doubts about the boundary locations of a meaningful video segment
due to their different affective responses.
This thesis presents an entirely new view to the problem of high level video
segmentation. We developed a novel probabilistic method for affective level video
content analysis and segmentation. Our method had two stages. In the first stage,
a®ective content labels were assigned to video shots by means of a dynamic bayesian
0. Abstract 3
network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN)
topology was proposed for this stage. The topology was based on the pleasure-
arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this
model can represent a large number of emotions. In the second stage, the visual,
audio and a®ective information of the video was used to compute a statistical feature
vector to represent the content of each shot. Affective level video segmentation was
achieved by applying spectral clustering to the feature vectors.
We evaluated the first stage of our proposal by comparing its emotion detec-
tion ability with all the existing works which are related to the field of a®ective video
content analysis. To evaluate the second stage, we used the time adaptive clustering
(TAC) algorithm as our performance benchmark. The TAC algorithm was the best
high level video segmentation method [2]. However, it is a very computationally
intensive algorithm. To accelerate its computation speed, we developed a modified
TAC (modTAC) algorithm which was designed to be mapped easily onto a field
programmable gate array (FPGA) device. Both the TAC and modTAC algorithms
were used as performance benchmarks for our proposed method.
Since affective video content is a perceptual concept, the segmentation per-
formance and human agreement rates were used as our evaluation criteria. To obtain
our ground truth data and viewer agreement rates, a pilot panel study which was
based on the work of Gross et al. [4] was conducted. Experiment results will show
the feasibility of our proposed method. For the first stage of our proposal, our
experiment results will show that an average improvement of as high as 38% was
achieved over previous works. As for the second stage, an improvement of as high
as 37% was achieved over the TAC algorithm
InfoLink: analysis of Dutch broadcast news and cross-media browsing
In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualised using SMIL-scripts for presenting the streaming broadcast news video and the information links
Video browsing interfaces and applications: a review
We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data—which, if presented in its raw format, is rather unwieldy and costly—have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other
TV News Story Segmentation Based on Semantic Coherence and Content Similarity
In this paper, we introduce and evaluate two novel approaches, one using video stream and the other using close-caption text stream, for segmenting TV news into stories. The segmentation of the video stream into stories is achieved by detecting anchor person shots and the text stream is segmented into stories using a Latent Dirichlet Allocation (LDA) based approach. The benefit of the proposed LDA based approach is that along with the story segmentation it also provides the topic distribution associated with each segment. We evaluated our techniques on the TRECVid 2003 benchmark database and found that though the individual systems give comparable results, a combination of the outputs of the two systems gives a significant improvement over the performance of the individual systems
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
- …