Search CORE

2,981 research outputs found

Development of a speech recognition system for Spanish broadcast news

Author: Jong Franciska de
Niculescu Andreea
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported

University of Twente Research Information

SCOLA: What it is and How to Get it

Author: Sprang Katie A.
Publication venue: 'The University of Kansas'
Publication date: 15/01/1991
Field of study

The University of Kansas: Journals@KU

Biodiversity Informatics

Large scale evaluations of multimedia information retrieval: the TRECVid experience

Author: G. Kazai
J.G. Fiscus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Information Retrieval is a supporting technique which underpins a broad range of content-based applications including retrieval, filtering, summarisation, browsing, classification, clustering, automatic linking, and others. Multimedia information retrieval (MMIR) represents those applications when applied to multimedia information such as image, video, music, etc. In this presentation and extended abstract we are primarily concerned with MMIR as applied to information in digital video format. We begin with a brief overview of large scale evaluations of IR tasks in areas such as text, image and music, just to illustrate that this phenomenon is not just restricted to MMIR on video. The main contribution, however, is a set of pointers and a summarisation of the work done as part of TRECVid, the annual benchmarking exercise for video retrieval tasks

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation

Author: Despres Julien
Gauvain Jean-Luc
Gravier Guillaume
Grefenstete Gregory
Guinaudeau Camille
Lamel Lori
Lawto Julien
Sébillot Pascale
Publication venue
Publication date: 27/09/2011
Field of study

One important class of online videos is that of news broadcasts. Most news organisations provide near-immediate access to topical news broadcasts over the Internet, through RSS streams or podcasts. Until lately, technology has not made it possible for a user to automatically go to the smaller parts, within a longer broadcast, that might interest them. Recent advances in both speech recognition systems and natural language processing have led to a number of robust tools that allow us to provide users with quicker, more focussed access to relevant segments of one or more news broadcast videos. Here we present our new interface for browsing or searching news broadcasts (video/audio) that exploits these new language processing tools to (i) provide immediate access to topical passages within news broadcasts, (ii) browse news broadcasts by events as well as by people, places and organisations, (iii) perform cross lingual search of news broadcasts, (iv) search for news through a map interface, (v) browse news by trending topics, and (vi) see automatically-generated textual clues for news segments, before listening. Our publicly searchable demonstrator currently indexes daily broadcast news content from 50 sources in English, French, Chinese, Arabic, Spanish, Dutch and Russian.Comment: NEM Summit, Torino : Italy (2011

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Using term clouds to represent segment-level semantic content of podcasts

Author: Besser Jana
de Rijke Maarten
Fuller Marguerite
Jones Gareth J.F.
Larson Martha
Newman Eamonn
Tsagkias Manos
Publication venue
Publication date: 01/01/2008
Field of study

Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts generated by automatic speech recognition (ASR). This paper examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript generated by automatic speech recognition (ASR). Quality of segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries

Irish Universities

DCU Online Research Access Service

UvA-DARE

International Migration, Integration and Social Cohesion online publications