Paralinguistic Privacy Protection at the Edge

Abstract

Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As our emotional patterns and sensitive attributes like our identity, gender, mental well-being, are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving, on-device paralinguistic information learning and filtering before transmitting voice data. In this paper we introduce EDGY, a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY's on-device performance and explore optimization techniques, including model quantization and knowledge distillation, to enable private, accurate and efficient representation learning on resource-constrained devices. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ABX score or minimal performance penalties in learning linguistic representations from raw voice signals, using a CPU and a single-core ARM processor without specialized hardware.Comment: 14 pages, 7 figures. arXiv admin note: text overlap with arXiv:2007.1506

    Similar works