9,732 research outputs found

    DolphinAtack: Inaudible Voice Commands

    Full text link
    Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems(VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though hidden, are nonetheless audible. In this work, we design a completely inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validate DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions. We validate that it is feasible to detect DolphinAttack by classifying the audios using supported vector machine (SVM), and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.Comment: 15 pages, 17 figure

    Towards responsive Sensitive Artificial Listeners

    Get PDF
    This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness

    A comparison of technologies for recording speech and the effects of speaker age

    Get PDF
    The purpose of this study was to compare recordings of speech samples obtained with a dedicated recording device to those from more readily available devices. Speech recordings are used by professionals in the field of communication sciences and disorders to identify and transcribe features of an individual’s speech, regardless of his or her age. This process requires high-quality recordings, but devices intended to produce such recordings are often expensive, not easily accessible, and have few uses. This research addressed the following questions: (A) What combination of microphones and recording devices provides the clearest speech sample based on a signal-to- noise ratio? (B) Does peak clipping distort recorded speech samples for certain combinations of microphones and recording devices? (C) Are differences present between the quality of recordings for different devices dependent on the subject’s age (i.e., child or adult)? Participants included 4 male and 5 female adults, ages 18-26, and 4 male and 4 female children, ages 5-10. Participants identified English as their first language, had hearing within normal limits, and had no communication disorders. Each participant was recorded performing three speech tasks. Speech was recorded simultaneously on a dedicated recording device, a personal computer, and an iPad, each paired with an omnidirectional, condenser lavalier microphone. Recordings were analyzed for signal-to-noise ratio (SNR) and instances of peak clipping. Results were compared by device and age group. Analyses revealed that the iPad yielded highest average SNR, but had the most variable values in most comparisons. The dedicated recording device generally had the lowest average SNR, but was the least variable. Peak clipping was not a significant factor for any device. No adult participants produced clipped signals, while three children did. Results suggest that readily available devices such as personal computers and iPads are appropriate for the types of acoustic analysis performed in this study

    The Validation of Speech Corpora

    Get PDF
    1.2 Intended audience........................

    Design and Implementation of a Podcast Recording Studio for Business Communications

    Get PDF
    Podcasting is the creation of recorded information and its delivery over the Internet, using web syndication technology. Businesses around the world are increasing their use of podcasts as a means for delivering information to their customers and employees. The purpose of this project paper is to examine the business need for podcasting, and to demonstrate that businesses can assemble an inexpensive recording studio to create podcasts. The review of previous literature included an examination of why podcasting has become important to business communications; how some businesses are currently using podcasting as a communications tool; the technology involved in podcasting; and the tools needed to create podcasts and how web syndication is used to distribute the finished recordings to end users. The paper includes information on the audio recording software that is necessary to record and edit the podcast. Also included is a discussion of the additional audio hardware such as microphones and mixers that are required to record audio. Configuring the equipment in the podcasting studio is described. This includes setting up the computer to be used for recording, attaching the audio mixer to the computer, and connecting the microphone to the audio mixer. The installation of the recording software is also described. The paper concludes with recommendations for businesses to find ways to track the results of podcasts. Also included is a recommendation for further academic study of the uses of podcasting

    Assistive Listening Devices: A Guide

    Full text link
    Objective: The purpose of this research was to develop a guide on assistive listening devices (ALDs) describing the various types of ALDs, the basic underlying concepts, their advantages and disadvantages, the instrumentation and its components, and the setup and procedures for specification/evaluation of ALDs in accordance with national standards or guidelines issued by professional organizations in our field. This guide is intended for audiologists, hearing scientists, and audiology and hearing science students. Method: A thorough review of the previous ALD literature including national and international standards for set-up and installation, specification/evaluation and verification of ALDs; guidelines from professional audiology and acoustic and hearing sciences organizations for ALD set-up and installation, specification/evaluation and verification; peer-reviewed studies on ALDs; text-book chapters and books on ALDs; and ALD websites from professional organizations. Results: This guide was organized by ALD type, and was subcategorized by the basic underlying concepts, their advantages and disadvantages, the instrumentation and components, and the setup/installation and procedures for specification/evaluation and verification. A comparative analysis was also performed on the relative benefits of various ALDs in a real-word application setting. Discussion: This guide demonstrates that ALDs facilitate communicative efficiency in persons with hearing loss in adverse listening environments. Selection of an appropriate ALD should be based on the intended system use and the intended listening environment. Appropriately selected and fitted ALDs help individuals detect environmental sounds or improve their speech recognition in specific listening settings. Also, ALDs can enable higher levels of communicative performance would be obtained with just the use of individual hearing technology alone. Conclusion: The research findings demonstrate that ALDs improve audibility and overall listening benefit for individuals with hearing loss, especially those with compatible hearing technology. The guide can help one ensure optimal ALD performance to maximize communicative benefit; it serves as a resource for audiologists, hearing scientists, and audiology and hearing science students to develop a better understanding of topics related to ALDs; appropriate ALDs to recommend to persons with hearing loss for various listening situations; set-up and installation of ALDs; and evaluation and verification of ALD performanc

    End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

    Full text link
    Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

    Development of a Field-Deployable Voice-Controlled Ultrasound Scanner System

    Get PDF
    Modern ultrasound scanners are portable and have become very useful for clinical diagnosis. However, they have limitations for field use purposes, primarily because they occupy both hands of the physician who performs the scanning. The goal of this thesis is to develop a wearable voice-controlled ultrasound scanner system that would enable the physician to provide a fast and efficient diagnosis. This is expected to become very useful for emergency and trauma applications. A commercially available ultrasound scanner system, Terason 2000, was chosen as the basis for development. This system consists of a laptop, a hardware unit containing the RF beamforming and signal processing chips and the ultrasound transducer. In its commercial version, the control of the ultrasound system is performed via a Graphical User Interface with a Windows-application look and feel. In the system we developed, a command and control speech recognition engine and a noise-canceling microphone are selected to control the scanner using voice commands. A mini-joystick is attached to the top of the ultrasound transducer for distance and area measurements and to perform zooming of the ultrasound images. An eye-wear viewer connected to the laptop enables the user to view the ultrasound images directly. Power management features are incorporated into the ultrasound system in order to conserve the battery power. A wireless connection is set up with a remote laptop to enable real-time transmission of wireless images. The result is a truly untethered, voice-controlled, ultrasound system enclosed in a backpack and monitored by the eye-wear viewer. (In the second generation of this system, the laptop is replaced by an embedded PC and is incorporated into a photographer’s vest). The voice-controlled system has to be made reliable under various forms of background noise. Three command and control speech recognition systems were selected and their recognition performances were determined under different types and levels of ambient noise. The variation of recognition rates was also analyzed over 6 different speakers. A detailed testing was also conducted to identify the ideal combination of a microphone and speech recognition engine suitable for the ultrasound scanner system. Six different microphones, each with their own unique methods of implementing noise cancellation features, were chosen as candidates for this analysis. The testing was conducted by making recordings inside a highly reverberant acoustic noisy chamber, and the recordings were fed to the automatic speech recognition engines offline for performance evaluation. The speech recognition engine and microphone selected as a result of this extensive testing were then incorporated into the wearable ultrasound scanner system. This thesis also discusses the implementation of the human-speech interface, which also plays a major role in the effectiveness of the voice-controlled ultrasound scanner system
    • …
    corecore