14 research outputs found
The relationship of word error rate to document ranking
This paper describes two experiments that examine the relationship of Word Error Rate (WER) of retrieved
spoken documents returned by a spoken document retrieval system. Previous work has demonstrated that
recognition errors do not significantly affect retrieval effectiveness but whether they will adversely affect
relevance judgement remains unclear. A user-based experiment measuring ability to judge relevance from
the recognised text presented in a retrieved result list was conducted. The results indicated that users were
capable of judging relevance accurately despite transcription errors. This lead an examination of the
relationship of WER in retrieved audio documents to their rank position when retrieved for a particular
query. Here it was shown that WER was somewhat lower for top ranked documents than it was for
documents retrieved further down the ranking, thereby indicating a possible explanation for the success of
the user experiment
Speech and hand transcribed retrieval
This paper describes the issues and preliminary work involved
in the creation of an information retrieval system that will
manage the retrieval from collections composed of both speech
recognised and ordinary text documents. In previous work, it
has been shown that because of recognition errors, ordinary
documents are generally retrieved in preference to recognised
ones. Means of correcting or eliminating the observed bias is
the subject of this paper. Initial ideas and some preliminary
results are presented
Search of spoken documents retrieves well recognized transcripts
This paper presents a series of analyses and experiments on spoken
document retrieval systems: search engines that retrieve transcripts produced by
speech recognizers. Results show that transcripts that match queries well tend to
be recognized more accurately than transcripts that match a query less well.
This result was described in past literature, however, no study or explanation of
the effect has been provided until now. This paper provides such an analysis
showing a relationship between word error rate and query length. The paper
expands on past research by increasing the number of recognitions systems that
are tested as well as showing the effect in an operational speech retrieval
system. Potential future lines of enquiry are also described
Spoken content metadata and MPEG-7
The words spoken in an audio stream form an obvious descriptor essential to most audio-visual metadata standards. When derived using automatic speech recognition systems, the spoken content fits into neither low-level (representative) nor high-level (semantic) metadata categories. This results in difficulties in creating a representation that can support both interoperability between different extraction and application utilities while retaining robustness to the limitations of the extraction process. In this paper, we discuss the issues encountered in the design of the MPEG-7 spoken content descriptor and their applicability to other metadata standards
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Diversité et systÚme de recommandation : application à une plateforme de blogs à fort trafic (convention CIFRE n°20091274)
Les systĂšmes de recommandation ont pour objectif de proposer automatiqueÂment aux usagers des objets en relation avec leurs intĂ©rĂȘts. Ces outils d'aide Ă l'accĂšs Ă l'information sont de plus en plus prĂ©sents sur les plateformes de conteÂnus. Dans ce contexte, les intĂ©rĂȘts des usagers peuvent ĂȘtre modĂ©lisĂ©s Ă partir du contenu des documents visitĂ©s ou des actions rĂ©alisĂ©es (clics, commentaires, ...). Cependant, ces intĂ©rĂȘts ne peuvent ĂȘtre modĂ©lisĂ©s en cas de dĂ©marrage Ă froid, c'est-Ă -dire pour un usager inconnu du systĂšme ou un nouveau document. Cette modĂ©lisation s'avĂšre donc complexe Ă obtenir, et demeure parfois incomÂplĂšte, conduisant Ă des recommandations bien souvent Ă©loignĂ©es des intĂ©rĂȘts rĂ©els des usagers. De plus, les approches existantes ne sont gĂ©nĂ©ralement pas en meÂsure de garantir des performances satisfaisantes sur des plateformes Ă fort trafic et hĂ©bergeant une volumĂ©trie de donnĂ©es consĂ©quente. Pour tendre vers des recommandations plus pertinentes, nous proposons un modĂšle de systĂšme de recommandation qui construit une liste de recommandations rĂ©pondant Ă un large spectre d'intĂ©rĂȘts potentiels, et ce mĂȘme dans un contexte oĂč le systĂšme ne possĂšde que peu d'informations sur l'usager. L'originalitĂ© de notre modĂšle est qu'il repose sur la notion de diversitĂ©. Cette diversitĂ© est obtenue en agrĂ©geant le rĂ©sultat de diffĂ©rentes mesures de sĂ©lection pour construire la liste de recommandations finale. AprĂšs avoir dĂ©montrĂ© l'intĂ©rĂȘt de notre approche en utilisant des corpus des rĂ©fĂ©rences, ainsi qu'au travers d'une Ă©valuation auprĂšs d'usagers rĂ©els, nous Ă©valuons notre modĂšle sur la plateforme de blogs OverBlog. Nous validons ainsi notre proposition dans un contexte industriel Ă grande Ă©chelle.Recommender Systems aim at automatically providing objects related to user's interests. These tools are increasingly used on content platforms to help the users to access information. In this context, user's interests can be modeled from the visited content and/or user's actions (clicks, comments, etc). However, these interests can not be modeled for an unknown user (cold start issue). Therefore, modeling is complex and recommendations are often far away from the real user's interests. In addition, existing approaches are generally not able to guarantee good performances on platforms with high trafic and which host a significant volume of data. To obtain more relevant recommendations for each user, we propose a recommender system model that builds a list of recommendations aiming at covering a large range of interests, even when only few information about the user is available. The recommender system model we propose is based on diversity. It uses different interest measures and an aggregation function to build the final set of recommendations. We demonstrate the interest of our approach using reference collections and through a user study. Finally, we evaluate our model on the OverBlog platform to validate its scalability in an industrial context
Diversité et systÚme de recommandation : application à une plateforme de blogs à fort trafic
Recommender Systems aim at automatically providing objects related to userâs interests. These tools are increasingly used on content platforms to help the users to access information. In this context, userâs interests can be modeled from the visited content and/or userâs actions (clicks, comments, etc). However, these interests can not be modeled for an unknown user (cold start issue). Therefore, modeling is complex and recommendations are often far away from the real userâs interests. In addition, existing approaches are generally not able to guarantee good performances on platforms with high traffic and which host a significant volume of data.To obtain more relevant recommendations for each user, we propose a recommender system model that builds a list of recommendations aiming at covering a large range of interests, even when only few information about the user is available. The recommender system model we propose is based on diversity. It uses different interest measures and an aggregation function to build the final set of recommendations.We demonstrate the interest of our approach using reference collections and through a user study. Finally, we evaluate our model on the OverBlog platform to validate its scalability in an industrial context.Les systĂšmes de recommandation ont pour objectif de proposer automatiquement aux usagers des objets en relation avec leurs intĂ©rĂȘts. Ces outils dâaide Ă lâaccĂšs Ă lâinformation sont de plus en plus prĂ©sents sur les plateformes de contenus. Dans ce contexte, les intĂ©rĂȘts des usagers peuvent ĂȘtre modĂ©lisĂ©s Ă partir du contenu des documents visitĂ©s ou des actions rĂ©alisĂ©es (clics, commentaires, ...). Cependant, ces intĂ©rĂȘts ne peuvent ĂȘtre modĂ©lisĂ©s en cas de dĂ©marrage Ă froid, câest-Ă -dire pour un usager inconnu du systĂšme ou un nouveau document. Cette modĂ©lisation sâavĂšre donc complexe Ă obtenir, et demeure parfois incomplĂšte, conduisant Ă des recommandations bien souvent Ă©loignĂ©es des intĂ©rĂȘts rĂ©els des usagers. De plus, les approches existantes ne sont gĂ©nĂ©ralement pas en mesure de garantir des performances satisfaisantes sur des plateformes Ă fort trafic et hĂ©bergeant une volumĂ©trie de donnĂ©es consĂ©quente.Pour tendre vers des recommandations plus pertinentes, nous proposons un modĂšle de systĂšme de recommandation qui construit une liste de recommandations rĂ©pondant Ă un large spectre dâintĂ©rĂȘts potentiels, et ce mĂȘme dans un contexte oĂč le systĂšme ne possĂšde que peu dâinformations sur lâusager. LâoriginalitĂ© de notre modĂšle est quâil repose sur la notion de diversitĂ©. Cette diversitĂ© est obtenue en agrĂ©geant le rĂ©sultat de diffĂ©rentes mesures de sĂ©lection pour construire la liste de recommandations finale.AprĂšs avoir dĂ©montrĂ© lâintĂ©rĂȘt de notre approche en utilisant des corpus des rĂ©fĂ©rences, ainsi quâau travers dâune Ă©valuation auprĂšs dâusagers rĂ©els, nous Ă©valuons notre modĂšle sur la plateforme de blogs OverBlog. Nous validons ainsi notre proposition dans un contexte industriel Ă grande Ă©chelle
Automatic processing of computer-transcribed spoken documents from multimedia archives
Tato prĂĄce se zamÄĆuje na ĆeĆĄenĂ komplexnĂho problĂ©mu jak strukturalizovat (vhodnÄ rozÄlenit, textovÄ i foneticky analyzovat a nĂĄslednÄ upravit) vĂœstup systĂ©mu pro automatickĂ© rozpoznĂĄvĂĄnĂ ĆeÄi tak, aby byl co nejÄitelnÄjĆĄĂ pro ÄlovÄka a zĂĄroveĆ pĆipravenĂœ pro efektivnĂ strojovĂ© zpracovĂĄnĂ a vyhledĂĄvĂĄnĂ. MotivacĂ pro ĆeĆĄenĂ tohoto problĂ©mu byl vĂœzkumnĂœ projekt podporovanĂœ Ministerstvem kultury ÄR, jehoĆŸ cĂlem bylo pĆepsat mluvenĂ© dokumenty z archivu ÄeskĂ©ho a ÄeskoslovenskĂ©ho rozhlasu a zpĆĂstupnit je pro vyhledĂĄvĂĄnĂ. Vzhledem k rozsahu archivu (213.000 dokumentĆŻ z obdobĂ 1923 aĆŸ 2014) bylo nutnĂ© navrhnout a zrealizovat takovĂœ postup a technologie, kterĂ© by byly schopny zvlĂĄdnout nejen obrovskĂ© mnoĆŸstvĂ dat, ale takĂ© specifickĂ© problĂ©my souvisejĂcĂ s rĆŻznou kvalitou zĂĄznamĆŻ, s pĆĂtomnostĂ ÄeskĂ©ho i slovenskĂ©ho jazyka v dokumentech, se stĆĂdajĂcĂmi se mluvÄĂmi, s proklĂĄdĂĄnĂm ĆeÄi znÄlkami, hudebnĂmi pĆedÄly a pĂsniÄkami Äi s hluky na pozadĂ ĆeÄi.This thesis focuses on solving a complex task how to structure (i.e. appropriately divide, textually and phonetically analyze and subsequently modify) the output of the speech recognition system so it is most readable for human and also prepared for effective machine processing and search. Motivation to solve this task was the research project supported by the Czech Ministry of culture, aimed at transcription of spoken documents contained in the Czech and Czechoslovak radio and to make them available for search. Taking into account the archive size (213,000 documents form the years 1923-2014) it was essential to propose and implement such technologies, that were able to handle not only the waste amount of the data but also some specific issues associated with different acoustic quality of the documents, speaker changes, presence of jingles, music divides and song between the speech segments or with background noise
Experiments in Spoken Document Retrieval at CMU
We describe our submission to the TREC-6 Spoken Document Retrieval (SDR) track and the speech recognition and the information retrieval engines. We present SDR evaluation results and a brief analysis. A few developments and experiments are also described in detail including: . Vocabulary size experiments, which assess the effect of words missing from the speech recognition vocabulary. For our 51,000-word vocabulary the effect was minimal