46 research outputs found

    Language-based multimedia information retrieval

    Get PDF
    This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

    Audio Indexing on the Web: a Preliminary Study of Some Audio Descriptors

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceThe "Invisible Web" is composed of documents which can not be currently accessed by Web search engines, because they have a dynamic URL or are not textual, like video or audio documents. For audio documents, one solution is automatic indexing. It consists in finding good descriptors of audio documents which can be used as indexes for archiving and search. This paper presents an overview and recent results of the RAIVES project, a French research project on audio indexing. We present speech/music segmentation, speaker tracking, and keywords detection. We also give a few perspectives of the RAIVES project

    OLIVE: Speech-Based Video Retrieval

    Get PDF
    This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the basis for text-based retrieval functionality. The retrieval demonstrator builds on and extends the architecture from the Pop-Eye project, a system applying human language technology on subtitles for the disclosure of video fragments

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Projet RAIVES (Recherche Automatique d'Informations Verbales Et Sonores) vers l'extraction et la structuration de données radiophoniques sur Internet

    Get PDF
    Rapport de contrat.Internet est devenu un vecteur important de la communication. Il permet la diffusion et l'Ă©change d'un volume croissant de donnĂ©es. Il ne s'agit donc plus seulement de collecter des masses importantes " d'informations Ă©lectroniques ", mais surtout de les rĂ©pertorier, de les classer pour faciliter l'accĂšs Ă  l'information utile. Une information, aussi importante soit-elle, sur un site non rĂ©pertoriĂ©, est mĂ©connue. Il ne faut donc pas nĂ©gliger la part du " Web invisible ". Le Web invisible peut se dĂ©finir comme l'ensemble des informations non indexĂ©es, soit parce qu'elles ne sont pas rĂ©pertoriĂ©es, soit parce que les pages les contenant sont dynamiques, soit encore parce que leur nature n'est pas ou difficilement indexable. En effet, la plupart des moteurs de recherche se basent sur une analyse textuelle du contenu des pages, mais ne peuvent prendre en compte le contenu des documents sonores ou visuels. Il faut donc fournir un ensemble d'Ă©lĂ©ments descripteurs du contenu pour structurer les documents afin que l'information soit accessible aux moteurs de recherche. S'agissant de documents sonores, le but de notre projet est donc, d'une part, d'extraire ces informations et, d'autre part, de fournir une structuration des documents afin de faciliter l'accĂšs au contenu. L'indexation par le contenu de documents sonores s'appuie sur des techniques utilisĂ©es en traitement automatique de la parole, mais doit ĂȘtre distinguĂ©e de l'alignement automatique d'un texte sur un flux sonore ou encore de la reconnaissance automatique de la parole. Ce serait alors rĂ©duire le contenu d'un document sonore Ă  sa seule composante verbale. Or, la composante non-verbale d'un document sonore est importante et correspond souvent Ă  une structuration particuliĂšre du document. Par exemple, dans le cas de documents radiophoniques, on voit l'alternance de parole et de musique, plus particuliĂšrement de jingles, pour annoncer les informations. Ainsi, nous pouvons considĂ©rer un ensemble de descripteurs du contenu d'un document radiophonique : segments de Parole/Musique, " sons clĂ©s ", langue, changements de locuteurs associĂ©s Ă  une Ă©ventuelle identification de ces locuteurs, mots clĂ©s et thĂšmes. Cet ensemble peut ĂȘtre bien entendu enrichi. Extraire l'ensemble des descripteurs est sans doute suffisant pour rĂ©fĂ©rencer un document sur Internet. Mais il est intĂ©ressant d'aller plus loin et de donner accĂšs Ă  des parties prĂ©cises du document. Chaque descripteur doit ĂȘtre associĂ© Ă  un marqueur temporel qui donne accĂšs directement Ă  l'information. Cependant, l'ensemble des descripteurs appartenant Ă  des niveaux de description diffĂ©rents, leur organisation n'est pas linĂ©aire dans le temps : un mĂȘme locuteur peut parler en deux langues sur un mĂȘme segment de parole, ou encore sur un segment de parole dans une langue donnĂ©e, plusieurs locuteurs peuvent intervenir. Il faut donc aussi ĂȘtre capable de fournir une structuration de l'information sur diffĂ©rents niveaux de reprĂ©sentation

    Dynamic language modeling for European Portuguese

    Get PDF
    Doutoramento em Engenharia InformĂĄticaActualmente muitas das metodologias utilizadas para transcrição e indexação de transmissĂ”es noticiosas sĂŁo baseadas em processos manuais. Com o processamento e transcrição deste tipo de dados os prestadores de serviços noticiosos procuram extrair informação semĂąntica que permita a sua interpretação, sumarização, indexação e posterior disseminação selectiva. Pelo que, o desenvolvimento e implementação de tĂ©cnicas automĂĄticas para suporte deste tipo de tarefas tĂȘm suscitado ao longo dos Ășltimos anos o interesse pela utilização de sistemas de reconhecimento automĂĄtico de fala. Contudo, as especificidades que caracterizam este tipo de tarefas, nomeadamente a diversidade de tĂłpicos presentes nos blocos de notĂ­cias, originam um elevado nĂșmero de ocorrĂȘncia de novas palavras nĂŁo incluĂ­das no vocabulĂĄrio finito do sistema de reconhecimento, o que se traduz negativamente na qualidade das transcriçÔes automĂĄticas produzidas pelo mesmo. Para lĂ­nguas altamente flexivas, como Ă© o caso do PortuguĂȘs Europeu, este problema torna-se ainda mais relevante. Para colmatar este tipo de problemas no sistema de reconhecimento, vĂĄrias abordagens podem ser exploradas: a utilização de informaçÔes especĂ­ficas de cada um dos blocos noticiosos a ser transcrito, como por exemplo os scripts previamente produzidos pelo pivot e restantes jornalistas, e outro tipo de fontes como notĂ­cias escritas diariamente disponibilizadas na Internet. Este trabalho engloba essencialmente trĂȘs contribuiçÔes: um novo algoritmo para selecção e optimização do vocabulĂĄrio, utilizando informação morfosintĂĄctica de forma a compensar as diferenças linguĂ­sticas existentes entre os diferentes conjuntos de dados; uma metodologia diĂĄria para adaptação dinĂąmica e nĂŁo supervisionada do modelo de linguagem, utilizando mĂșltiplos passos de reconhecimento; metodologia para inclusĂŁo de novas palavras no vocabulĂĄrio do sistema, mesmo em situaçÔes de nĂŁo existĂȘncia de dados de adaptação e sem necessidade re-estimação global do modelo de linguagem.Most of today methods for transcription and indexation of broadcast audio data are manual. Broadcasters process thousands hours of audio and video data on a daily basis, in order to transcribe that data, to extract semantic information, and to interpret and summarize the content of those documents. The development of automatic and efficient support for these manual tasks has been a great challenge and over the last decade there has been a growing interest in the usage of automatic speech recognition as a tool to provide automatic transcription and indexation of broadcast news and random and relevant access to large broadcast news databases. However, due to the common topic changing over time which characterizes this kind of tasks, the appearance of new events leads to high out-of-vocabulary (OOV) word rates and consequently to degradation of recognition performance. This is especially true for highly inflected languages like the European Portuguese language. Several innovative techniques can be exploited to reduce those errors. The use of news shows specific information, such as topic-based lexicons, pivot working script, and other sources such as the online written news daily available in the Internet can be added to the information sources employed by the automatic speech recognizer. In this thesis we are exploring the use of additional sources of information for vocabulary optimization and language model adaptation of a European Portuguese broadcast news transcription system. Hence, this thesis has 3 different main contributions: a novel approach for vocabulary selection using Part-Of-Speech (POS) tags to compensate for word usage differences across the various training corpora; language model adaptation frameworks performed on a daily basis for single-stage and multistage recognition approaches; a new method for inclusion of new words in the system vocabulary without the need of additional data or language model retraining
    corecore