1,039 research outputs found

    Efficient Embedded Speech Recognition for Very Large Vocabulary Mandarin Car-Navigation Systems

    Get PDF
    Automatic speech recognition (ASR) for a very large vocabulary of isolated words is a difficult task on a resource-limited embedded device. This paper presents a novel fast decoding algorithm for a Mandarin speech recognition system which can simultaneously process hundreds of thousands of items and maintain high recognition accuracy. The proposed algorithm constructs a semi-tree search network based on Mandarin pronunciation rules, to avoid duplicate syllable matching and save redundant memory. Based on a two-stage fixed-width beam-search baseline system, the algorithm employs a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce recognition time. This algorithm is aimed at an in-car navigation system in China and simulated on a standard PC workstation. The experimental results show that the proposed method reduces recognition time by nearly 6-fold and memory size nearly 2- fold compared to the baseline system, and causes less than 1% accuracy degradation for a 200,000 word recognition task

    The influence of personal navigation devices on drivers\u27 visual attention on the road ahead and driving performance

    Get PDF
    Nowadays, personal navigation devices (PNDs) that provide GPS-based directions are widespread in vehicles. These devices typically display the real-time location of the vehicle on a map and play spoken prompts when drivers need to turn. While such devices are less distracting than paper directions, their graphical display may distract users from their primary task of driving. This thesis investigates the influence of two PNDs on driving performance and visual attention. In the experiments conducted with a high fidelity driving simulator, we found that drivers using a navigation system with a graphical display indeed spent less time looking at the road compared to those using a navigation system with spoken directions only. Furthermore, glancing at the display was correlated with higher variance in driving performance measures. We discuss the implications of these findings on PND design for vehicles

    Speech Interfaces and Pilot Performance: A Meta-Analysis

    Get PDF
    As the aviation industry modernizes, new technology and interfaces must support growing aircraft complexity without increasing pilot workload. Natural language processing presents just such a simple and intuitive interface, yet the performance implications for use by pilots remain unknown. A meta-analysis was conducted to understand performance effects of using speech and voice interfaces in a series of pilot task analogs. The inclusion criteria selected studies that involved participants performing a demanding primary task, such as driving, while interacting with a vehicle system to enter numbers, dial radios, or enter a navigation destination. Compared to manual system interfaces, voice interfaces reduced subjective workload and inattention. Time to complete interactions with the interface appear to vary based upon the individual system, with systems that require prompts or provide confirmation taking the longest. Implications of introducing natural language systems in aviation are discussed and recommendations for future study are provided

    DolphinAtack: Inaudible Voice Commands

    Full text link
    Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems(VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though hidden, are nonetheless audible. In this work, we design a completely inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validate DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions. We validate that it is feasible to detect DolphinAttack by classifying the audios using supported vector machine (SVM), and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.Comment: 15 pages, 17 figure

    Next-Generation Smart Cars: Towards a More Intelligent Interactive Infotainment System

    Get PDF
    abstract: Today, in a world of automation, the impact of Artificial Intelligence can be seen in every aspect of our lives. Starting from smart homes to self-driving cars everything is run using intelligent, adaptive technologies. In this thesis, an attempt is made to analyze the correlation between driving quality and its impact on the use of car infotainment system and vice versa and hence the driver distraction. Various internal and external driving factors have been identified to understand the dependency and seriousness of driver distraction caused due to the car infotainment system. We have seen a number UI/UX changes, speech recognition advancements in cars to reduce distraction. But reducing the number of casualties on road is still a persisting problem in hand as the cognitive load on the driver is considered to be one of the primary reasons for distractions leading to casualties. In this research, a pathway has been provided to move towards building an artificially intelligent, adaptive and interactive infotainment which is trained to behave differently by analyzing the driving quality without the intervention of the driver. The aim is to not only shift focus of the driver from screen to street view, but to also change the inherent behavior of the infotainment system based on the driving statistics at that point in time without the need for driver intervention.Dissertation/ThesisMasters Thesis Software Engineering 201

    Multimodal MP3 Jukebox

    Get PDF
    This report, prepared for the Worcester Polytechnic Institute, describes the design, testing, and analysis of a voice- and button-controlled MP3 Jukebox for automotive applications. This project, sponsored by the Bose Corporation, incorporated modern software components and complete interaction features. An inexpensive, high-function driving simulation system was developed and used in conjunction with the Peripheral Detection Task (PDT) to measure driver distraction. As tested with four subjects, Jukebox interaction increased the overall median reaction time by 133 milliseconds

    A framework for evaluating in-vehicle applications regarding safety

    Get PDF
    Driver distraction is a serious and growing hazard to road safety. With the rapid introduction of the new information, communication, and entertainment technologies, this problem is becoming more threatening in the coming years. For instance, drivers who use mobile phones while driving are more likely to be involved in the car crash than those who do not. However, using technologies such as mobile phones and navigation systems in a vehicle can have different personal, social, practical, and psychological advantages which outweigh the risk. Therefore, there are number of metrics and methods for evaluating in-vehicle technologies, their services, applications, and functionalities to improve and make them safer. The purpose of this thesis was to investigate and develop a new framework which consists of a safety evaluation library and server for evaluating in-vehicle applications with safety consideration. By the use of the framework, Original Equipment Manufacturers (OEMs) and third-party developers can get safety feedback from their in-vehicle applications in the real driving situation. For the safety analysis, different metrics were investigated. Due to the time restriction and based on the possible information that could be collected from an application and a vehicle, four metrics were used in the safety analysis including, total task time, number of interactions, speed, and completed task rate. In addition, the framework was deployed and tested by two case-study applications and some interesting results were discovered. Moreover, it was found out using the safety library by third-party developers is quite easy, which is one of the most important factors in the usability area
    corecore