Search CORE

3,199 research outputs found

Concurrent collaborative captioning

Author: Wald Mike
Publication venue: CSREA Press
Publication date: 01/01/2013
Field of study

Captioned text transcriptions of the spoken word can benefit hearing impaired people, non native speakers, anyone if no audio is available (e.g. watching TV at an airport) and also anyone who needs to review recordings of what has been said (e.g. at lectures, presentations, meetings etc.) In this paper, a tool is described that facilitates concurrent collaborative captioning by correction of speech recognition errors to provide a sustainable method of making videos accessible to people who find it difficult to understand speech through hearing alone. The tool stores all the edits of all the users and uses a matching algorithm to compare users’ edits to check if they are in agreement

Southampton (e-Prints Soton)

The relationship of word error rate to document ranking

Author: Mang Shou X.
Sanderson M.
Tuffs N.
Publication venue
Publication date: 01/01/2003
Field of study

This paper describes two experiments that examine the relationship of Word Error Rate (WER) of retrieved spoken documents returned by a spoken document retrieval system. Previous work has demonstrated that recognition errors do not significantly affect retrieval effectiveness but whether they will adversely affect relevance judgement remains unclear. A user-based experiment measuring ability to judge relevance from the recognised text presented in a retrieved result list was conducted. The results indicated that users were capable of judging relevance accurately despite transcription errors. This lead an examination of the relationship of WER in retrieved audio documents to their rank position when retrieved for a particular query. Here it was shown that WER was somewhat lower for top ranked documents than it was for documents retrieved further down the ranking, thereby indicating a possible explanation for the success of the user experiment

CiteSeerX

White Rose Research Online

Automatic measurement of propositional idea density from part-of-speech tagging

Author: Brown Cati
Covington Michael A.
Herman Ruth E.
Kemper Susan
Snodgrass Tony
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2008
Field of study

The original publication is available at www.springerlink.comThe Computerized Propositional Idea Density Rater (CPIDR, pronounced “spider”) is a computer program that determines the propositional idea density (P-density) of an English text automatically on the basis of partof-speech tags. The key idea is that propositions correspond roughly to verbs, adjectives, adverbs, prepositions, and conjunctions. After tagging the parts of speech using MontyLingua (Liu, 2004), CPIDR applies numerous rules to adjust the count, such as combining auxiliary verbs with the main verb. A “speech mode” is provided in which CPIDR rejects repetitions and a wider range of fillers. CPIDR is a user-friendly Windows .NET application distributed as open-source freeware under GPL. Tested against human raters, it agrees with the consensus of two human raters better than the team of five raters agree with each other [r(80) = .97 vs. r(10) = .82, respectively]

KU ScholarWorks

PubMed Central

Automated Speech Recognition for Captioned Telephone Conversations

Author: Adams Jeff, CEO
Basye Kenneth, PhD
Fletcher Andrew, PhD
Kim Jangwon, PhD
Parlikar Alok, PhD
Publication venue: Clark Digital Commons
Publication date: 03/11/2017
Field of study

Internet Protocol Captioned Telephone Service is a service for people with hearing loss, allowing them to communicate effectively by having a human Communications Assistant transcribe the call and equipment that displays the transcription in near real time. The current state of the art for ASR is considered with regard to automating such service. Recent results on standard tests are examined and appropriate metrics for ASR performance in captioning are discussed. Possible paths for developing fully-automated telephone captioning services are examined and the effort involved is evaluated

Clark University

Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

Author: Brants T.
Favre B.
Garofolo J.S.
Gaur Y.
Gray S.S.
Lei X.
Mishra T.
Rousseau A.
Wang Y.-Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/12/2017
Field of study

The accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have little correlation with human-subject performance on many applications. We propose a new captioning-focused evaluation metric that better predicts the impact of ASR recognition errors on the usability of automatically generated captions for people who are Deaf or Hard of Hearing (DHH). Through a user study with 30 DHH users, we compared our new metric with the traditional WER metric on a caption usability evaluation task. In a side-by-side comparison of pairs of ASR text output (with identical WER), the texts preferred by our new metric were preferred by DHH participants. Further, our metric had significantly higher correlation with DHH participants' subjective scores on the usability of a caption, as compared to the correlation between WER metric and participant subjective scores. This new metric could be used to select ASR systems for captioning applications, and it may be a better metric for ASR researchers to consider when optimizing ASR systems.Comment: 10 pages, 8 figures, published in ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17

arXiv.org e-Print Archive

Crossref

TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

Author: Basoglu Chris
Behre Piyush
Chang Shuangyu
Kesavamoorthy Harini
Pathak Sayan
Shah Amy
Tan Sharman
Zuo Fei
Publication venue
Publication date: 26/10/2022
Field of study

Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection

arXiv.org e-Print Archive