230,483 research outputs found

    Exploring Transfer Learning For End-to-End Spoken Language Understanding

    Full text link
    Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language Understanding pipeline; first, an Automatic Speech Recognition (ASR) component to process customer speech and generate text transcriptions, followed by a Natural Language Understanding (NLU) component to map transcriptions to an actionable hypothesis. An end-to-end (E2E) system that goes directly from speech to a hypothesis is a more attractive option. These systems were shown to be smaller, faster, and better optimized. However, they require massive amounts of end-to-end training data and in addition, don't take advantage of the already available ASR and NLU training data. In this work, we propose an E2E system that is designed to jointly train on multiple speech-to-text tasks, such as ASR (speech-transcription) and SLU (speech-hypothesis), and text-to-text tasks, such as NLU (text-hypothesis). We call this the Audio-Text All-Task (AT-AT) Model and we show that it beats the performance of E2E models trained on individual tasks, especially ones trained on limited data. We show this result on an internal music dataset and two public datasets, FluentSpeech and SNIPS Audio, where we achieve state-of-the-art results. Since our model can process both speech and text input sequences and learn to predict a target sequence, it also allows us to do zero-shot E2E SLU by training on only text-hypothesis data (without any speech) from a new domain. We evaluate this ability of our model on the Facebook TOP dataset and set a new benchmark for zeroshot E2E performance. We will soon release the audio data collected for the TOP dataset for future research.Comment: AAAI 202

    Smooth inverse frequency based text data selection for medical dictation

    Get PDF
    Under-resourced domain problem is significant in automatic speech recognition, especially in small languages such as Hungarian or in fields where data is often confidential such as finance and medicine. We introduce a method using word embedding and smooth inverse frequency (SIF) based distance measurement to filter public domain web corpora. The selection for (medical) domain matching documents can be scaled. The resulted text is used to train an augmented language model for a medical dictation system. We show that using the appropriately scaled selection leads to optimal performance of the ASR system over the baselines where no data augmentation was applied or all the augmentation data was added

    Computer interfaces for the visually impaired

    Get PDF
    Information access via computer terminals extends to blind and low vision persons employed in many technical and nontechnical disciplines. Two aspects are detailed of providing computer technology for persons with a vision related handicap. First, research into the most effective means of integrating existing adaptive technologies into information systems was made. This was conducted to integrate off the shelf products with adaptive equipment for cohesive integrated information processing systems. Details are included that describe the type of functionality required in software to facilitate its incorporation into a speech and/or braille system. The second aspect is research into providing audible and tactile interfaces to graphics based interfaces. Parameters are included for the design and development of the Mercator Project. The project will develop a prototype system for audible access to graphics based interfaces. The system is being built within the public domain architecture of X windows to show that it is possible to provide access to text based applications within a graphical environment. This information will be valuable to suppliers to ADP equipment since new legislation requires manufacturers to provide electronic access to the visually impaired

    Exploration of audiovisual heritage using audio indexing technology

    Get PDF
    This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users

    Golan v. Holder: Copyright in the Image of the First Amendment

    Get PDF
    Does copyright violate the First Amendment? Professor Melville Nimmer asked this question forty years ago, and then answered it by concluding that copyright itself is affirmatively speech protective. Despite ample reason to doubt Nimmer’s response, the Supreme Court has avoided an independent, thoughtful, plenary review of the question. Copyright has come to enjoy an all-but-categorical immunity to First Amendment constraints. Now, however, the Court faces a new challenge to its back-of-the-hand treatment of this vital conflict. In Golan v. Holder the Tenth Circuit considered legislation (enacted pursuant to the Berne Convention and TRIPS) “restoring” copyright protection to millions of foreign works previously thought to belong to the public domain. The Tenth Circuit upheld the legislation, but not without noting that it appeared to raise important First Amendment concerns. The Supreme Court granted certiorari. This article addresses the issues in the Golan case, literally on the eve of oral argument before the Court. This article first considers the Copyright and Treaty Clauses, and then addresses the relationship between copyright and the First Amendment. The discussion endorses an understanding of that relationship in which the Amendment is newly seen as paramount, and copyright is newly seen in the image of the Amendment
    • 

    corecore