24 research outputs found

    Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

    Full text link
    The accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have little correlation with human-subject performance on many applications. We propose a new captioning-focused evaluation metric that better predicts the impact of ASR recognition errors on the usability of automatically generated captions for people who are Deaf or Hard of Hearing (DHH). Through a user study with 30 DHH users, we compared our new metric with the traditional WER metric on a caption usability evaluation task. In a side-by-side comparison of pairs of ASR text output (with identical WER), the texts preferred by our new metric were preferred by DHH participants. Further, our metric had significantly higher correlation with DHH participants' subjective scores on the usability of a caption, as compared to the correlation between WER metric and participant subjective scores. This new metric could be used to select ASR systems for captioning applications, and it may be a better metric for ASR researchers to consider when optimizing ASR systems.Comment: 10 pages, 8 figures, published in ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17

    Word Importance Modeling to Enhance Captions Generated by Automatic Speech Recognition for Deaf and Hard of Hearing Users

    Get PDF
    People who are deaf or hard-of-hearing (DHH) benefit from sign-language interpreting or live-captioning (with a human transcriptionist), to access spoken information. However, such services are not legally required, affordable, nor available in many settings, e.g., impromptu small-group meetings in the workplace or online video content that has not been professionally captioned. As Automatic Speech Recognition (ASR) systems improve in accuracy and speed, it is natural to investigate the use of these systems to assist DHH users in a variety of tasks. But, ASR systems are still not perfect, especially in realistic conversational settings, leading to the issue of trust and acceptance of these systems from the DHH community. To overcome these challenges, our work focuses on: (1) building metrics for accurately evaluating the quality of automatic captioning systems, and (2) designing interventions for improving the usability of captions for DHH users. The first part of this dissertation describes our research on methods for identifying words that are important for understanding the meaning of a conversational turn within transcripts of spoken dialogue. Such knowledge about the relative importance of words in spoken messages can be used in evaluating ASR systems (in part 2 of this dissertation) or creating new applications for DHH users of captioned video (in part 3 of this dissertation). We found that models which consider both the acoustic properties of spoken words as well as text-based features (e.g., pre-trained word embeddings) are more effective at predicting the semantic importance of a word than models that utilize only one of these types of features. The second part of this dissertation describes studies to understand DHH users\u27 perception of the quality of ASR-generated captions; the goal of this work was to validate the design of automatic metrics for evaluating captions in real-time applications for these users. Such a metric could facilitate comparison of various ASR systems, for determining the suitability of specific ASR systems for supporting communication for DHH users. We designed experimental studies to elicit feedback on the quality of captions from DHH users, and we developed and evaluated automatic metrics for predicting the usability of automatically generated captions for these users. We found that metrics that consider the importance of each word in a text are more effective at predicting the usability of imperfect text captions than the traditional Word Error Rate (WER) metric. The final part of this dissertation describes research on importance-based highlighting of words in captions, as a way to enhance the usability of captions for DHH users. Similar to highlighting in static texts (e.g., textbooks or electronic documents), highlighting in captions involves changing the appearance of some texts in caption to enable readers to attend to the most important bits of information quickly. Despite the known benefits of highlighting in static texts, research on the usefulness of highlighting in captions for DHH users is largely unexplored. For this reason, we conducted experimental studies with DHH participants to understand the benefits of importance-based highlighting in captions, and their preference on different design configurations for highlighting in captions. We found that DHH users subjectively preferred highlighting in captions, and they reported higher readability and understandability scores and lower task-load scores when viewing videos with captions containing highlighting compared to the videos without highlighting. Further, in partial contrast to recommendations in prior research on highlighting in static texts (which had not been based on experimental studies with DHH users), we found that DHH participants preferred boldface, word-level, non-repeating highlighting in captions

    Effect of Speech Recognition Errors on Text Understandability for People who are Deaf or Hard of Hearing

    Get PDF
    Recent advancements in the accuracy of Automated Speech Recognition (ASR) technologies have made them a potential candidate for the task of captioning. However, the presence of errors in the output may present challenges in their use in a fully automatic system. In this research, we are looking more closely into the impact of different inaccurate transcriptions from the ASR system on the understandability of captions for Deaf or Hard-of-Hearing (DHH) individuals. Through a user study with 30 DHH users, we studied the effect of the presence of an error in a text on its understandability for DHH users. We also investigated different prediction models to capture this relation accurately. Among other models, our random forest based model provided the best mean accuracy of 62.04% on the task. Further, we plan to improve this model with more data and use it to advance our investigation on ASR technologies to improve ASR based captioning for DHH users

    Μελέτη Αξιοπιστίας Αυτόματης και σε Πραγματικό Χρόνο Μεταγραφής Ομιλίας (Υποτιτλισμού) σε Τηλεδιδασκαλία

    Get PDF
    Η χρήση μεθόδων αυτόματης αναγνώρισης ομιλίας για την μεταγραφή σε πραγματικό χρόνο του προφορικού λόγου του διδάσκοντα σε υπότιτλους επιτρέπει σε άτομα (φοιτητές ή μαθητές) με κώφωση ή με βαρηκοΐα να παρακολουθήσουν τηλεδιδασκαλία μέσω του διαδικτύου. Σκοπός της διπλωματικής είναι να διεξάγει μια συστηματική μελέτη της αξιοπιστίας χρήσης ενός web-based εργαλείου μεταγραφής/υποτιτλισμού για την ελληνική γλώσσα, και συγκεκριμένα του Web Captioner, κατά την τηλεδιδασκαλία στην τριτοβάθμια εκπαίδευση με χρήση μετρικών αντικειμενικής αξιολόγησης της απόδοσης. Στην πειραματική διαδικασία συμμετέχουν 26 ομιλητές και με κατάλληλα επιλεγμένο corpus από αντιπροσωπευτικό δείγμα πέντε διαφορετικών πανεπιστημιακών τμημάτων/σχολών. Έχει γίνει εμπεριστατωμένη ανάλυση των λαθών, κάνοντας σύγκριση με το αρχικό κείμενο. Καταλήξαμε στα εξής κύρια συμπεράσματα: το Word Error Rate (WER) ήταν μικρότερο από ~2,5% για το 50% των χρηστών, ενώ ανερχόταν έως και σε 5-6% για το 90% των χρηστών. Επίσης, υπήρχε σημαντική στατιστική συσχέτιση μεταξύ του χρήστη και του επιπέδου λαθών στα διαφορετικά κείμενα που χρησιμοποιήθηκαν. Εξετάζοντας τις λέξεις που συχνότερα αποδίδονταν λάθος από το Web Captioner, παρατηρήσαμε ότι πολλές είναι μονογραμματικά άρθρα, άλλες είναι αρκετά σπάνιες για τις οποίες τυγχάνει να υπάρχουν ηχητικά παρόμοιες αλλά πολύ λιγότερο σπάνιες λέξεις στην ελληνική, και άλλες χρησιμοποιούνται συχνά ως προθέματα αλλά δεν αναγνωρίζονται ως σύνδεσμοι (π.χ. “εκ”).The use of automatic speech recognition methods (ASR) for transcribing speech of a lecturer/teacher in real time allows deaf and hearing impaired students to attend distance learning online. The goal of this dissertation is to conduct a systematic study of the reliability in using a web-based transcription/subtitling tool (Web Captioner) for the Greek language, in the context of distance learning in higher education, while using well-established metrics for evaluating the tool’s performance. The experimental evaluation involved 26 participants and used sampled texts selected from a representative corpus of five different university departments / faculties. We made a thorough analysis of the errors generated in the live captioning process, by comparing the captions with the original text. We came to the following main conclusions: Word Error Rate (WER) was less than ~2.5% at the 50th percentile of users, and up to 5-6% at the 90th percentile. There was also a significant statistical correlation between the user and the number of errors in the different texts that were used. After examining the words most often captured incorrectly by Web Captioner, we noticed that many are single-letter articles, others are quite rare and happen to have similar-sounding words that occur much more frequently in Greek, and others are often used as prefixes but not as easily recognized as conjunctions (e.g., "εκ")

    Languages, Literacies, and Translations: Examining Deaf Students' Language Ideologies through English-to-ASL Translations of Literature.

    Full text link
    Educators have long grappled with how print literacy might be best taught to deaf students and which language might best serve this purpose: spoken English, American Sign Language (ASL), or another communication mode. Over the decades, pedagogical approaches have been introduced and critiqued according to the various ideologies of different stakeholders. We know very little, however, about the ideologies that deaf students themselves carry about language and the complex ways these ideologies may be contributing to or interfering with their acquisition of print literacy. This dissertation, thus, explores deaf high school students’ attitudes and beliefs about language and interrogates how their ideologies are confirmed, contradicted, or complicated through their encounters with English and ASL via ASL translations of literature in their English classroom. This qualitative study collected data on how deaf students’ ideologies played out when their teacher integrated a unit consisting of ASL translations of English literary works into their English class. The findings highlight how the students’ language ideologies are neither predictable nor consistent, and that many students carry conflicting and even mistaken ideologies about each language that lead them to believe that ASL has no grammar rules and disparage English for being too strict. Moreover, the students’ ideologies profoundly affect the degree of alienation or ownership that they feel towards each language, and especially towards print literacy, which nearly all of the students identify as being a “hearing” practice. The students’ complex relationship with each language is illuminated especially clearly in their reactions to ASL translations of English texts, an experience that many of them found to be enriching and deeply validating because for the first time, they could bring their literacy practices and linguistic strengths from ASL to the experience of reading in the English classroom, and thus achieve a more meaningful and evocative reading of the stories. The ways these students interacted with the ASL translations challenge us to broaden our understanding of literacy and reading so that it is inclusive of the literacy practices that they brought to the table while working with the translations.PhDEnglish and EducationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133217/1/raspoon_1.pd

    Supporting Voice-Based Natural Language Interactions for Information Seeking Tasks of Various Complexity

    Get PDF
    Natural language interfaces have seen a steady increase in their popularity over the past decade leading to the ubiquity of digital assistants. Such digital assistants include voice activated assistants, such as Amazon's Alexa, as well as text-based chat bots that can substitute for a human assistant in business settings (e.g., call centers, retail / banking websites) and at home. The main advantages of such systems are their ease of use and - in the case of voice-activated systems - hands-free interaction. The majority of tasks undertaken by users of these commercially available voice-based digital assistants are simple in nature, where the responses of the agent are often determined using a rules-based approach. However, such systems have the potential to support users in completing more complex and involved tasks. In this dissertation, I describe experiments investigating user behaviours when interacting with natural language systems and how improvements in design of such systems can benefit the user experience. Currently available commercial systems tend to be designed in a way to mimic superficial characteristics of a human-to-human conversation. However, the interaction with a digital assistant differs significantly from the interaction between two people, partly due to limitations of the underlying technology such as automatic speech recognition and natural language understanding. As computing technology evolves, it may make interactions with digital assistants resemble those between humans. The first part of this thesis explores how users will perceive the systems that are capable of human-level interaction, how users will behave while communicating with such systems, and new opportunities that may be opened by that behaviour. Even in the absence of the technology that allows digital assistants to perform on a human level, the digital assistants that are widely adopted by people around the world are found to be beneficial for a number of use-cases. The second part of this thesis describes user studies aiming at enhancing the functionality of digital assistants using the existing level of technology. In particular, chapter 6 focuses on expanding the amount of information a digital assistant is able to deliver using a voice-only channel, and chapter 7 explores how expanded capabilities of voice-based digital assistants would benefit people with visual impairments. The experiments presented throughout this dissertation produce a set of design guidelines for existing as well as potential future digital assistants. Experiments described in chapters 4, 6, and 7 focus on supporting the task of finding information online, while chapter 5 considers a case of guiding a user through a culinary recipe. The design recommendations provided by this thesis can be generalised in four categories: how naturally a user can communicate their thoughts to the system, how understandable the system's responses are to the user, how flexible the system's parameters are, and how diverse the information delivered by the system is

    Assessment and revision of a paediatric diagnostic audiology report

    Get PDF
    Optimising outcomes for children with hearing impairment (HI) requires a family centred approach that prioritises parent involvement. Families must be provided with information to encourage participation; and meet their need for emotional support and knowledge. Diagnostic audiology reports can help provide this information, but their delivery alone is insufficient. If these reports are not readable and comprehendible they cannot meet national and international legal standards, nor can they support the health literacy of parents. The majority of New Zealand adults have insufficient health literacy skills, a concerning fact given the strong association between poor health literacy and negative health outcomes. The aim of this study was to evaluate a paediatric diagnostic audiology report, revise it and verify the revision. A mock audiology report was evaluated via a readability analysis and semi-structured interviews with parent participants. Results confirmed that the report was difficult to read and understand. Next, the report was revised using best practice guidelines and parental recommendations. Verification of the revision process with 32 participants revealed that parents who read the revised report had significantly greater comprehension, self-efficacy and perception ratings than parents who read the unrevised report. Additionally, the report’s readability was markedly improved. These results may have critical implications for parents and their children with HI. Incomprehensible audiology reports fail to support parental health literacy, promote understanding, encourage participation or offer emotional support. Because knowledge is power for these families, it is hoped that the findings of this study will be recognised and implemented into clinical practice
    corecore