Unlocking Foundation Models for Privacy-Enhancing Speech Understanding:
  An Early Study on Low Resource Speech Training Leveraging Label-guided
  Synthetic Speech Content

Bose, Digbalay; Feng, Tiantian; Narayanan, Shrikanth; Shi, Xuan

Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

Authors: Digbalay Bose
Tiantian Feng
Shrikanth Narayanan
Xuan Shi
Publication date: 13 June 2023
Publisher

Abstract

Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assist privacy-enhancing speech computing. Unlike conventional works focusing primarily on data perturbation or distributed algorithms, our work studies the possibilities of using pre-trained generative models to synthesize speech content as training data with just label guidance. We show that zero-shot learning with training label-guided synthetic speech content remains a challenging task. On the other hand, our results demonstrate that the model trained with synthetic speech samples provides an effective initialization point for low-resource ASU training. This result reveals the potential to enhance privacy by reducing user data collection but using label-guided synthetic speech content

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.07791

Last time updated on 16/06/2023