Search CORE

230,483 research outputs found

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Author: Arkoudas Konstantine
Cai Liwei
Hamza Wael
Liu Beiye
Rongali Subendhu
Su Chengwei
Publication venue
Publication date: 15/12/2020
Field of study

Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language Understanding pipeline; first, an Automatic Speech Recognition (ASR) component to process customer speech and generate text transcriptions, followed by a Natural Language Understanding (NLU) component to map transcriptions to an actionable hypothesis. An end-to-end (E2E) system that goes directly from speech to a hypothesis is a more attractive option. These systems were shown to be smaller, faster, and better optimized. However, they require massive amounts of end-to-end training data and in addition, don't take advantage of the already available ASR and NLU training data. In this work, we propose an E2E system that is designed to jointly train on multiple speech-to-text tasks, such as ASR (speech-transcription) and SLU (speech-hypothesis), and text-to-text tasks, such as NLU (text-hypothesis). We call this the Audio-Text All-Task (AT-AT) Model and we show that it beats the performance of E2E models trained on individual tasks, especially ones trained on limited data. We show this result on an internal music dataset and two public datasets, FluentSpeech and SNIPS Audio, where we achieve state-of-the-art results. Since our model can process both speech and text input sequences and learn to predict a target sequence, it also allows us to do zero-shot E2E SLU by training on only text-hypothesis data (without any speech) from a new domain. We evaluate this ability of our model on the Facebook TOP dataset and set a new benchmark for zeroshot E2E performance. We will soon release the audio data collected for the TOP dataset for future research.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Smooth inverse frequency based text data selection for medical dictation

Author: Bálint Domonkos
Mihajlik Péter
Publication venue
Publication date: 01/01/2021
Field of study

Under-resourced domain problem is significant in automatic speech recognition, especially in small languages such as Hungarian or in fields where data is often confidential such as finance and medicine. We introduce a method using word embedding and smooth inverse frequency (SIF) based distance measurement to filter public domain web corpora. The selection for (medical) domain matching documents can be scaled. The resulted text is used to train an augmented language model for a medical dictation system. We show that using the appropriately scaled selection leads to optimal performance of the ASR system over the baselines where no data augmentation was applied or all the augmentation data was added

University of Szeged

Computer interfaces for the visually impaired

Author: Higgins Gerry
Publication venue
Publication date
Field of study

Information access via computer terminals extends to blind and low vision persons employed in many technical and nontechnical disciplines. Two aspects are detailed of providing computer technology for persons with a vision related handicap. First, research into the most effective means of integrating existing adaptive technologies into information systems was made. This was conducted to integrate off the shelf products with adaptive equipment for cohesive integrated information processing systems. Details are included that describe the type of functionality required in software to facilitate its incorporation into a speech and/or braille system. The second aspect is research into providing audible and tactile interfaces to graphics based interfaces. Parameters are included for the design and development of the Mercator Project. The project will develop a prototype system for audible access to graphics based interfaces. The system is being built within the public domain architecture of X windows to show that it is possible to provide access to text based applications within a graphical environment. This information will be valuable to suppliers to ADP equipment since new legislation requires manufacturers to provide electronic access to the visually impaired

NASA Technical Reports Server

Exploration of audiovisual heritage using audio indexing technology

Author: Heeren Willemijn
Jong Franciska de
Ordelman Roeland
Publication venue
Publication date: 01/01/2006
Field of study

This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users

University of Twente Research Information

Golan v. Holder: Copyright in the Image of the First Amendment

Author: Lange David L.
Reed Shiveh Roxana
Weaver Risa J.
Publication venue: Duke University School of Law
Publication date: 01/01/2011
Field of study

Does copyright violate the First Amendment? Professor Melville Nimmer asked this question forty years ago, and then answered it by concluding that copyright itself is affirmatively speech protective. Despite ample reason to doubt Nimmer’s response, the Supreme Court has avoided an independent, thoughtful, plenary review of the question. Copyright has come to enjoy an all-but-categorical immunity to First Amendment constraints. Now, however, the Court faces a new challenge to its back-of-the-hand treatment of this vital conflict. In Golan v. Holder the Tenth Circuit considered legislation (enacted pursuant to the Berne Convention and TRIPS) “restoring” copyright protection to millions of foreign works previously thought to belong to the public domain. The Tenth Circuit upheld the legislation, but not without noting that it appeared to raise important First Amendment concerns. The Supreme Court granted certiorari. This article addresses the issues in the Golan case, literally on the eve of oral argument before the Court. This article first considers the Copyright and Treaty Clauses, and then addresses the relationship between copyright and the First Amendment. The discussion endorses an understanding of that relationship in which the Amendment is newly seen as paramount, and copyright is newly seen in the image of the Amendment

bepress Legal Repository

Duke Law Scholarship Repository