10 research outputs found
Blockwise -Tampering Attacks on Cryptographic Primitives, Extractors, and Learners
Austrin, Chung, Mahmoody, Pass and Seth (Crypto\u2714) studied the notion of bitwise -tampering attacks over randomized algorithms in which an efficient `virus\u27 gets to control each bit of the randomness with independent probability in an online way. The work of Austrin et al. showed how to break certain `privacy primitives\u27 (e.g., encryption, commitments, etc.) through bitwise -tampering, by giving a bitwise -tampering biasing attack for increasing the average of any efficient function by where is the variance of .
In this work, we revisit and extend the bitwise tampering model of Austrin et al. to blockwise setting, where blocks of randomness becomes tamperable with independent probability . Our main result is an efficient blockwise -tampering attack to bias the average of any efficient function mapping arbitrary to by regardless of how is partitioned into individually tamperable blocks . Relying on previous works, our main biasing attack immediately implies efficient attacks against the privacy primitives as well as seedless multi-source extractors, in a model where the attacker gets to tamper with each block (or source) of the randomness with independent probability . Further, we show how to increase the classification error of deterministic learners in the so called `targeted poisoning\u27 attack model under Valiant\u27s adversarial noise. In this model, an attacker has a `target\u27 test data in mind and wishes to increase the error of classifying while she gets to tamper with each training example with independent probability an in an online way
Truth inference at scale: A Bayesian model for adjudicating highly redundant crowd annotations
Crowd-sourcing is a cheap and popular means of creating training and evaluation datasets for machine learning, however it poses the problem of 'truth inference', as individual workers cannot be wholly trusted to provide reliable annotations. Research into models of annotation aggregation attempts to infer a latent 'true' annotation, which has been shown to improve the utility of crowd-sourced data. However, existing techniques beat simple baselines only in low redundancy settings, where the number of annotations per instance is low (≤ 3), or in situations where workers are unreliable and produce low quality annotations (e.g., through spamming, random, or adversarial behaviours.) As we show, datasets produced by crowd-sourcing are often not of this type: the data is highly redundantly annotated (≥ 5 annotations per instance), and the vast majority of workers produce high quality outputs. In these settings, the majority vote heuristic performs very well, and most truth inference models underperform this simple baseline. We propose a novel technique, based on a Bayesian graphical model with conjugate priors, and simple iterative expectation-maximisation inference. Our technique produces competitive performance to the state-of-the-art benchmark methods, and is the only method that significantly outperforms the majority vote heuristic at one-sided level 0.025, shown by significance tests. Moreover, our technique is simple, is implemented in only 50 lines of code, and trains in seconds
Legion: Best-first concolic testing (competition contribution)
Legion is a grey-box coverage-based concolic tool that aims to balance the complementary nature of fuzzing and symbolic execution to achieve the best of both worlds. It proposes a variation of Monte Carlo tree search (MCTS) that formulates program exploration as sequential decision-making under uncertainty guided by the best-first search strategy. It relies on approximate path-preserving fuzzing, a novel instance of constrained random testing, which quickly generates many diverse inputs that likely target program parts of interest. In Test-Comp 2020 [1], the prototype performed within 90% of the best score in 9 of 22 categories
Attacking Data Transforming Learners at Training Time
While machine learning systems are known to be vulnerable to data-manipulation attacks at both training and deployment time, little is known about how to adapt attacks when the defender transforms data prior to model estimation. We consider the setting where the defender Bob first transforms the data then learns a model from the result; Alice, the attacker, perturbs Bob’s input data prior to him transforming it. We develop a general-purpose “plug and play” framework for gradient-based attacks based on matrix differentials, focusing on ordinary least-squares linear regression. This allows learning algorithms and data transformations to be paired and composed arbitrarily: attacks can be adapted through the use of the chain rule—analogous to backpropagation on neural network parameters—to compositional learning maps. Bestresponse attacks can be computed through matrix multiplications from a library of attack matrices for transformations and learners. Our treatment of linear regression extends state-ofthe-art attacks at training time, by permitting the attacker to affect both features and targets optimally and simultaneously. We explore several transformations broadly used across machine learning with a driving motivation for our work being autogressive modeling. There, Bob transforms a univariate time series into a matrix of observations and vector of target values which can then be fed into standard learners. Under this learning reduction, a perturbation from Alice to a single value of the time series affects features of several data points along with target values
Exploring misconceptions as a trigger for enhancing student learning
This article addresses the importance of confronting misconceptions in the teaching of the STEM disciplines. First, we review the central place for threshold concepts in many disciplines and the threat misconceptions pose to quality education. Second, approaches will be offered for confronting misconceptions in the classroom in different contexts. Finally, we discuss what we can learn about these approaches and the common threads that reveal successful approaches. These steps have been explored in relation to four case studies across diverse disciplines. From these case studies, a set of principles about how best to address misconceptions in STEM disciplines has been distilled. As conceptual knowledge increases in importance in higher education, effective strategies for helping students develop accurate conceptual understanding will also be increasingly critical