73 research outputs found
Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking
Automatic speech recognition (ASR) models are prevalent, particularly in
applications for voice navigation and voice control of domestic appliances. The
computational core of ASRs are deep neural networks (DNNs) that have been shown
to be susceptible to adversarial perturbations; easily misused by attackers to
generate malicious outputs. To help test the security and robustnesss of ASRS,
we propose techniques that generate blackbox (agnostic to the DNN), untargeted
adversarial attacks that are portable across ASRs. This is in contrast to
existing work that focuses on whitebox targeted attacks that are time consuming
and lack portability.
Our techniques generate adversarial attacks that have no human audible
difference by manipulating the audio signal using a psychoacoustic model that
maintains the audio perturbations below the thresholds of human perception. We
evaluate portability and effectiveness of our techniques using three popular
ASRs and two input audio datasets using the metrics - Word Error Rate (WER) of
output transcription, Similarity to original audio, attack Success Rate on
different ASRs and Detection score by a defense system. We found our
adversarial attacks were portable across ASRs, not easily detected by a
state-of-the-art defense system, and had significant difference in output
transcriptions while sounding similar to original audio.Comment: 11 pages, 7 figures and 3 table
Building trust in deep learning-based immune response predictors with interpretable explanations
The ability to predict whether a peptide will get presented on Major Histocompatibility Complex (MHC) class I molecules has profound implications in designing vaccines. Numerous deep learning-based predictors for peptide presentation on MHC class I molecules exist with high levels of accuracy. However, these MHC class I predictors are treated as black-box functions, providing little insight into their decision making. To build turst in these predictors, it is crucial to understand the rationale behind their decisions with human-interpretable explanations. We present MHCXAI, eXplainable AI (XAI) techniques to help interpret the outputs from MHC class I predictors in terms of input peptide features. In our experiments, we explain the outputs of four state-of-the-art MHC class I predictors over a large dataset of peptides and MHC alleles. Additionally, we evaluate the reliability of the explanations by comparing against ground truth and checking their robustness. MHCXAI seeks to increase understanding of deep learning-based predictors in the immune response domain and build trust with validated explanations
CAN WE TRUST EXPLAINABLE AI METHODS ON ASR? AN EVALUATION ON PHONEME RECOGNITION
Explainable AI (XAI) techniques have been widely used to help explain and understand the output of deep learning models in fields such as image classification and Natural Language Processing. Interest in using XAI techniques to explain deep learning-based Automatic Speech Recognition (ASR) is emerging. But there is not enough evidence on whether these explanations can be trusted. To address this, we adapt a state-of-the-art XAI technique from the image classification domain, Local Interpretable Model-Agnostic Explanations (LIME), to a model trained for a TIMIT-based phoneme recognition task. This simple task provides a controlled setting for evaluation while also providing expert annotated ground truth to assess the quality of explanations. We find a variant of LIME based on time partitioned audio segments, that we propose in this paper, produces the most reliable explanations, containing the ground truth 96% of the time in its top three audio segments
- …