Search CORE

268 research outputs found

Language-based multimedia information retrieval

Author: Gauvain J.L.
Hiemstra D.
Jong F.M.G. de
Netter K.
Publication venue
Publication date: 01/01/2000
Field of study

This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

CiteSeerX

Radboud Repository

University of Twente Research Information

Cross likelihood ratio based speaker clustering using eigenvoice models

Author: Dean David
Sridharan Sridha
Vogt Robert
Wang David
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2011
Field of study

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system

CiteSeerX

Queensland University of Technology ePrints Archive

TRECVID: evaluating the effectiveness of information retrieval tasks on digital video

Author: Kraaij Wessel
Over Paul
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

TRECVID is an annual exercise which encourages research in information retrieval from digital video by providing a large video test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of some semantic features, and the automatic segmentation of TV news broadcasts into non-overlapping news stories. TRECVID has a broad range of over 40 participating groups from across the world and as it is now (2004) in its 4th annual cycle it is opportune to stand back and look at the lessons we have learned from the cumulative activity. In this paper we shall present a brief and high-level overview of the TRECVID activity covering the data, the benchmarked tasks, the overall results obtained by groups to date and an overview of the approaches taken by selective groups in some tasks. While progress from one year to the next cannot be measured directly because of the changing nature of the video data we have been using, we shall present a summary of the lessons we have learned from TRECVID and include some pointers on what we feel are the most important of these lessons

Crossref

Irish Universities

DCU Online Research Access Service

TRECVID 2003 - an overview

Author: Kraaij Wessel
Over Paul
Smeaton Alan F.
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2003
Field of study

Irish Universities

DCU Online Research Access Service

OLIVE: Speech-Based Video Retrieval

Author: de Jong Franciska M.G.
den Hartog Jeremy
den Hartog Jurgen
Gauvain Jean-Luc
Netter Klaus
Publication venue: IRIT
Publication date: 01/01/1998
Field of study

This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the basis for text-based retrieval functionality. The retrieval demonstrator builds on and extends the architecture from the Pop-Eye project, a system applying human language technology on subtitles for the disclosure of video fragments

CiteSeerX

University of Twente Research Information

Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both ?

Author: Besacier Laurent
Le Viet Bac
Poignant Johann
Quénot Georges
Rosset Sophie
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audiencePersons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric mod- els is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

The MGB Challenge: Evaluating Multi-genre Broadcast Media Recognition

Author: Bell P.
Gales M.
Hain T.
Kilgour J.
Lanchantin P.
Liu A.
McParland A.
Renals S.
Saz O.
Wester M.
Woodland P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million words of BBC subtitle text was provided for language modelling. A novel aspect of the evaluation was the exploration of speech recognition and speaker diarization in a longitudinal setting - i.e. recognition of several episodes of the same show, and speaker diarization across these episodes, linking speakers. The longitudinal tasks also offered the opportunity for systems to make use of supplied metadata including show title, genre tag, and date/time of transmission. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained

Crossref

Edinburgh Research Explorer

White Rose Research Online

Dublin City University video track experiments for TREC 2002

Author: Browne Paul
Czirjék Csaba
Gurrin Cathal
Jarina Roman
Lee Hyowon
Marlow Seán
McDonald Kieran
Murphy Noel
O'Connor Noel E.
Smeaton Alan F.
Ye Jiamin
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2002
Field of study

Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems

DCU Online Research Access Service

The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection

Author: Alahari Karteek
Chesneau Nicolas
Douze Matthijs
Gauvain Jean-Luc
Harchaoui Zaid
Lamel Lori
Leray Clément
Oneata Dan
Paulin Mattis
Potapov Danila
Schmid Cordelia
Schmidt Christoph Andreas
Verbeek Jakob
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

-This paper describes our participation to the 2014 edition of the TrecVid Multimedia Event Detection task. Our system is based on a collection of local visual and audio descriptors, which are aggregated to global descriptors, one for each type of low-level descriptor, using Fisher vectors. Besides these features, we use two features based on convolutional networks: one for the visual channel, and one for the audio channel. Additional high-level featuresare extracted using ASR and OCR features. Finally, we used mid-level attribute features based on object and action detectors trained on external datasets. Our two submissions (INRIA-LIM-VocR and AXES) are identical interms of all the components, except for the ASR system that is used. We present an overview of the features andthe classification techniques, and experimentally evaluate our system on TrecVid MED 2011 data

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server