Search CORE

1,770 research outputs found

Mind the Gap: Another look at the problem of the semantic gap in image retrieval

Author: Enser Peter G. B.
Hare Jonathon S.
Lewis Paul H.
Sandom Christine J.
Publication venue
Publication date: 01/01/2006
Field of study

This paper attempts to review and characterise the problem of the semantic gap in image retrieval and the attempts being made to bridge it. In particular, we draw from our own experience in user queries, automatic annotation and ontological techniques. The first section of the paper describes a characterisation of the semantic gap as a hierarchy between the raw media and full semantic understanding of the media's content. The second section discusses real users' queries with respect to the semantic gap. The final sections of the paper describe our own experience in attempting to bridge the semantic gap. In particular we discuss our work on auto-annotation and semantic-space models of image retrieval in order to bridge the gap from the bottom up, and the use of ontologies, which capture more semantics than keyword object labels alone, as a technique for bridging the gap from the top down

Southampton (e-Prints Soton)

Ambient Multi-Camera Personal Documentary

Author: Addis M. J.
Beales R. M.
Middleton Stuart
Publication venue
Publication date: 01/01/2006
Field of study

Polymnia is an automated solution for the creation of ambient multi-camera personal documentary films. This short paper introduces the system, emphasising the rule-based documentary generation engine that we have created to assemble an edited narrative from source footage. We describe how such automatically generated media can be integrated with and augment personally-authored images and videos as a contribution to an individual’s personal digital memory

Southampton (e-Prints Soton)

Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision

Author: AF Smeaton
AF Smeaton
Alan F. Smeaton
B Huurnink
B Huurnink
CGM Snoek
CGM Snoek
CV Thornley
D. Tjondronegoro
H.-T. Pu
Johan Oomen
L. Hollink
M Hertzum
Paul Over
S Shatford
Wessel Kraaij
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark

Sound and Vision Publications

Overview of VideoCLEF 2009: New perspectives on speech-based multimedia content enrichment

Author: A. Hanjalic
A.F. Smeaton
J. Kekäläinen
J. Kürsten
J.J.M. Kierkels
J.M. Perea-Ortega
M. Larson
M. Larson
P. Pecina
S. Raaijmakers
T.-A. Dobrilǎ
Á. Gyarmati
Publication venue
Publication date: 01/01/2009
Field of study

VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the “Beeldenstorm” collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called “Finding Related Resources Across Languages,” involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language “Beeldenstorm” collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names

CiteSeerX

Access to recorded interviews: A research agenda

Author: Heeren W.F.L.
Jong F.M.G. de
Oard D.W.
Ordelman R.J.F.
Publication venue: ACM
Publication date: 01/01/2008
Field of study

Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

University of Twente Research Information

High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

Author: A. F. Smeaton
A. F. Smeaton
A. Loui
A. P. Natsev
A. Smeulders
C. G. M. Snoek
C. G. Snoek
E. Yilmaz
M. G. Christel
M. Naphade
M. R. Naphade
P. Joly
P. Over
T. Volkmer
W. Kraaij
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video

CiteSeerX

K-Space at TRECVid 2007

Author: Adamek Tomasz
Byrne Daragh
Jones Gareth J.F.
Keenan Gordon
Lee Hyowon
McGuinness Kevin
O'Connor Noel E.
Smeaton Alan F.
Wilkins Peter
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2007
Field of study

In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a ‘shot’ based interface, where the results from a query were presented as a ranked list of shots. The second interface was ‘broadcast’ based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level feature submission as well as low-level visual features

System Design for Structured Hypermedia Generation

Author: Berg C.A. van den
Hardman L. (Lynda)
Tam A.
Worring M. (Marcel)
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/01/1997
Field of study

Designing annotation before it's needed

Author: Frank Nack
Wolfgang Putz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study