17,772 research outputs found

    Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

    Full text link
    The accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have little correlation with human-subject performance on many applications. We propose a new captioning-focused evaluation metric that better predicts the impact of ASR recognition errors on the usability of automatically generated captions for people who are Deaf or Hard of Hearing (DHH). Through a user study with 30 DHH users, we compared our new metric with the traditional WER metric on a caption usability evaluation task. In a side-by-side comparison of pairs of ASR text output (with identical WER), the texts preferred by our new metric were preferred by DHH participants. Further, our metric had significantly higher correlation with DHH participants' subjective scores on the usability of a caption, as compared to the correlation between WER metric and participant subjective scores. This new metric could be used to select ASR systems for captioning applications, and it may be a better metric for ASR researchers to consider when optimizing ASR systems.Comment: 10 pages, 8 figures, published in ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17

    ELICA: An Automated Tool for Dynamic Extraction of Requirements Relevant Information

    Full text link
    Requirements elicitation requires extensive knowledge and deep understanding of the problem domain where the final system will be situated. However, in many software development projects, analysts are required to elicit the requirements from an unfamiliar domain, which often causes communication barriers between analysts and stakeholders. In this paper, we propose a requirements ELICitation Aid tool (ELICA) to help analysts better understand the target application domain by dynamic extraction and labeling of requirements-relevant knowledge. To extract the relevant terms, we leverage the flexibility and power of Weighted Finite State Transducers (WFSTs) in dynamic modeling of natural language processing tasks. In addition to the information conveyed through text, ELICA captures and processes non-linguistic information about the intention of speakers such as their confidence level, analytical tone, and emotions. The extracted information is made available to the analysts as a set of labeled snippets with highlighted relevant terms which can also be exported as an artifact of the Requirements Engineering (RE) process. The application and usefulness of ELICA are demonstrated through a case study. This study shows how pre-existing relevant information about the application domain and the information captured during an elicitation meeting, such as the conversation and stakeholders' intentions, can be captured and used to support analysts achieving their tasks.Comment: 2018 IEEE 26th International Requirements Engineering Conference Workshop

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding

    Full text link
    Abstractive community detection is an important spoken language understanding task, whose goal is to group utterances in a conversation according to whether they can be jointly summarized by a common abstractive sentence. This paper provides a novel approach to this task. We first introduce a neural contextual utterance encoder featuring three types of self-attention mechanisms. We then train it using the siamese and triplet energy-based meta-architectures. Experiments on the AMI corpus show that our system outperforms multiple energy-based and non-energy based baselines from the state-of-the-art. Code and data are publicly available.Comment: Update baseline

    A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version

    Get PDF
    During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective
    corecore