Search CORE

4 research outputs found

Conversational System Responses

Author: Kracun Aleksandar
Waters Austin
Publication venue: Technical Disclosure Commons
Publication date: 09/06/2017
Field of study

The disclosed system and method match verbosity of a machine\u27s responses to the verbosity or brevity of the user\u27s query. The system includes a semantic parser connected to an audio input and output device to process and respond to queries. The method includes building a simple statistical model of mean (M) and standard deviation (SD) for the lengths of the audio of the user’s query utterances and the multiple variants of the generated text-to-speech (TTS) output for each query type or action or intent. The system may extract fluff or slot filling words from verbose queries to use in formulating the response. The system then matches the M and SD between the query and the response to pick an optimum response. The system is more conversational and more dynamically reactive to the user\u27s input. This system retrieves and presents relevant information faster and may be more user-friendly for accessibility users

Technical Disclosure Common

Client-side masking for voice queries

Author: Kracun Aleksandar
Sharifi Matthew
Publication venue: Technical Disclosure Commons
Publication date: 27/03/2019
Field of study

Many voice-based assistive technologies transmit the voice input received from users to a server for processing. The transmitted audio includes the speaker’s voice which can identify the person. Users of such technologies therefore face a tradeoff between convenient voice interfaces with reduced privacy or less convenient non-voice input with higher privacy. Techniques described herein mask a user’s voice by locally processing the voice input received by a device. The masked voice cannot personally identify the user while still enabling server-side processing that allows recognition of spoken phrases. Application of the proposed techniques provides the user with greater privacy without diminishing the user experience for voice input in terms of recognition, latency, and other operational characteristics

Technical Disclosure Common

Health Diagnostics Using User Utterances

Author: Kracun Aleksandar
Moreno Ignacio Lopez
Publication venue: Technical Disclosure Commons
Publication date: 07/05/2020
Field of study

Respiratory illnesses can be hard to track and diagnose. Obtaining useful clinical data on these illnesses is difficult because it requires physical interaction, e.g., via nasal or sinus swab. It is known that respiratory illness can impact speech pathways. To this end, this disclosure describes techniques to use readily accessible software to obtain and classify potentially useful data. With user permission, utterances of the user, e.g., activation of a speech-activated device via a hotword, are analyzed to form speaker-ID models. These models are evaluated against additional utterances of the user in a sequential manner. The evaluation scores, along with the timestamps and details of the models, are aggregated to determine if the user has an interval of time where their speaker-ID models are unstable, inconsistent, or lacking self-similarity. This signal can be used as a proxy for detection or as a motivating factor for clinical investigation

Technical Disclosure Common

Secure audio processing

Author: Hughes Thad
Kracun Aleksandar
Lopez Moreno Ignacio
Moreno Pedro
Publication venue: Technical Disclosure Commons
Publication date: 10/04/2018
Field of study

Automatic speech recognizers (ASR) are now nearly ubiquitous, finding application in smart assistants, smartphones, smart speakers, and other devices. An attack on an ASR that triggers such a device into carrying out false instructions can lead to severe consequences. Typically, speech recognition is performed using machine learning models, e.g., neural networks, whose intermediate outputs are not always fully concealed. Exposing such intermediate outputs makes the crafting of malicious input audio easier. This disclosure describes techniques that thwart attacks on speech recognition systems by moving model inference processing to a secure computing enclave. The memory of the secure enclave and signals are inaccessible to the user and untrusted processes, and therefore, resistant to attacks

Technical Disclosure Common