1,981 research outputs found
BERT Probe : A python package for probing attention based robustness evaluation of BERT models
Transformer models based on attention-based architectures have been significantly successful in establishing
state-of-the-art results in natural language processing (NLP). However, recent work about adversarial robustness
of attention-based models show that their robustness is susceptible to adversarial inputs causing spurious
outputs thereby raising questions about trustworthiness of such models. In this paper, we present BERT Probe
which is a python-based package for evaluating robustness to attention attribution based on character-level
and word-level evasion attacks and empirically quantifying potential vulnerabilities for sequence classification
tasks. Additionally, BERT Probe also provides two out-of-the-box defenses against character-level attention
attribution-based evasion attacks
- …