38 research outputs found
Guarantees on learning depth-2 neural networks under a data-poisoning attack
In recent times many state-of-the-art machine learning models have been shown
to be fragile to adversarial attacks. In this work we attempt to build our
theoretical understanding of adversarially robust learning with neural nets. We
demonstrate a specific class of neural networks of finite size and a
non-gradient stochastic algorithm which tries to recover the weights of the net
generating the realizable true labels in the presence of an oracle doing a
bounded amount of malicious additive distortion to the labels. We prove (nearly
optimal) trade-offs among the magnitude of the adversarial attack, the accuracy
and the confidence achieved by the proposed algorithm.Comment: 11 page
Exacerbating Algorithmic Bias through Fairness Attacks
Algorithmic fairness has attracted significant attention in recent years,
with many quantitative measures suggested for characterizing the fairness of
different machine learning algorithms. Despite this interest, the robustness of
those fairness measures with respect to an intentional adversarial attack has
not been properly addressed. Indeed, most adversarial machine learning has
focused on the impact of malicious attacks on the accuracy of the system,
without any regard to the system's fairness. We propose new types of data
poisoning attacks where an adversary intentionally targets the fairness of a
system. Specifically, we propose two families of attacks that target fairness
measures. In the anchoring attack, we skew the decision boundary by placing
poisoned points near specific target points to bias the outcome. In the
influence attack on fairness, we aim to maximize the covariance between the
sensitive attributes and the decision outcome and affect the fairness of the
model. We conduct extensive experiments that indicate the effectiveness of our
proposed attacks