50 research outputs found

    Unintended Memorization in Large ASR Models, and How to Mitigate It

    Full text link
    It is well-known that neural networks can unintentionally memorize their training examples, causing privacy concerns. However, auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challenging due to the high compute cost of existing methods such as hardness calibration. In this work, we design a simple auditing method to measure memorization in large ASR models without the extra compute overhead. Concretely, we speed up randomly-generated utterances to create a mapping between vocal and text information that is difficult to learn from typical training examples. Hence, accurate predictions only for sped-up training examples can serve as clear evidence for memorization, and the corresponding accuracy can be used to measure memorization. Using the proposed method, we showcase memorization in the state-of-the-art ASR models. To mitigate memorization, we tried gradient clipping during training to bound the influence of any individual example on the final model. We empirically show that clipping each example's gradient can mitigate memorization for sped-up training examples with up to 16 repetitions in the training set. Furthermore, we show that in large-scale distributed training, clipping the average gradient on each compute core maintains neutral model quality and compute cost while providing strong privacy protection

    Improved Federated Learning for Handling Long-tail Words

    Get PDF
    Automatic speech recognition (ASR) machine learning models are deployed on client devices that include speech interfaces. ASR models can benefit from continuous learning and adaptation to large-scale changes, e.g., as new words are added to the vocabulary. While federated learning can be utilized to enable continuous learning for ASR models in a privacy preserving manner, the trained model can perform poorly on rarely occurring, long-tail words if the distribution of data used to train the model is skewed and does not adequately represent long-tail words. This disclosure describes federated learning techniques to improve ASR model quality when interpreting long-tail words given an imbalanced data distribution. Two different approaches - probabilistic sampling and client loss weighting - are described herein. In probabilistic sampling, the federated clients that include fewer long-tail words are less likely to be selected during training. In client loss weighting, incorrect predictions on long-tail words are more heavily penalized than for other words
    corecore