4 research outputs found

    Conversational System Responses

    Get PDF
    The disclosed system and method match verbosity of a machine\u27s responses to the verbosity or brevity of the user\u27s query. The system includes a semantic parser connected to an audio input and output device to process and respond to queries. The method includes building a simple statistical model of mean (M) and standard deviation (SD) for the lengths of the audio of the user’s query utterances and the multiple variants of the generated text-to-speech (TTS) output for each query type or action or intent. The system may extract fluff or slot filling words from verbose queries to use in formulating the response. The system then matches the M and SD between the query and the response to pick an optimum response. The system is more conversational and more dynamically reactive to the user\u27s input. This system retrieves and presents relevant information faster and may be more user-friendly for accessibility users

    Client-side masking for voice queries

    Get PDF
    Many voice-based assistive technologies transmit the voice input received from users to a server for processing. The transmitted audio includes the speaker’s voice which can identify the person. Users of such technologies therefore face a tradeoff between convenient voice interfaces with reduced privacy or less convenient non-voice input with higher privacy. Techniques described herein mask a user’s voice by locally processing the voice input received by a device. The masked voice cannot personally identify the user while still enabling server-side processing that allows recognition of spoken phrases. Application of the proposed techniques provides the user with greater privacy without diminishing the user experience for voice input in terms of recognition, latency, and other operational characteristics

    Health Diagnostics Using User Utterances

    Get PDF
    Respiratory illnesses can be hard to track and diagnose. Obtaining useful clinical data on these illnesses is difficult because it requires physical interaction, e.g., via nasal or sinus swab. It is known that respiratory illness can impact speech pathways. To this end, this disclosure describes techniques to use readily accessible software to obtain and classify potentially useful data. With user permission, utterances of the user, e.g., activation of a speech-activated device via a hotword, are analyzed to form speaker-ID models. These models are evaluated against additional utterances of the user in a sequential manner. The evaluation scores, along with the timestamps and details of the models, are aggregated to determine if the user has an interval of time where their speaker-ID models are unstable, inconsistent, or lacking self-similarity. This signal can be used as a proxy for detection or as a motivating factor for clinical investigation

    Secure audio processing

    Get PDF
    Automatic speech recognizers (ASR) are now nearly ubiquitous, finding application in smart assistants, smartphones, smart speakers, and other devices. An attack on an ASR that triggers such a device into carrying out false instructions can lead to severe consequences. Typically, speech recognition is performed using machine learning models, e.g., neural networks, whose intermediate outputs are not always fully concealed. Exposing such intermediate outputs makes the crafting of malicious input audio easier. This disclosure describes techniques that thwart attacks on speech recognition systems by moving model inference processing to a secure computing enclave. The memory of the secure enclave and signals are inaccessible to the user and untrusted processes, and therefore, resistant to attacks
    corecore