231 research outputs found
A Provable Defense for Deep Residual Networks
We present a training system, which can provably defend significantly larger
neural networks than previously possible, including ResNet-34 and DenseNet-100.
Our approach is based on differentiable abstract interpretation and introduces
two novel concepts: (i) abstract layers for fine-tuning the precision and
scalability of the abstraction, (ii) a flexible domain specific language (DSL)
for describing training objectives that combine abstract and concrete losses
with arbitrary specifications. Our training method is implemented in the DiffAI
system
Rehearsal: A Configuration Verification Tool for Puppet
Large-scale data centers and cloud computing have turned system configuration
into a challenging problem. Several widely-publicized outages have been blamed
not on software bugs, but on configuration bugs. To cope, thousands of
organizations use system configuration languages to manage their computing
infrastructure. Of these, Puppet is the most widely used with thousands of
paying customers and many more open-source users. The heart of Puppet is a
domain-specific language that describes the state of a system. Puppet already
performs some basic static checks, but they only prevent a narrow range of
errors. Furthermore, testing is ineffective because many errors are only
triggered under specific machine states that are difficult to predict and
reproduce. With several examples, we show that a key problem with Puppet is
that configurations can be non-deterministic.
This paper presents Rehearsal, a verification tool for Puppet configurations.
Rehearsal implements a sound, complete, and scalable determinacy analysis for
Puppet. To develop it, we (1) present a formal semantics for Puppet, (2) use
several analyses to shrink our models to a tractable size, and (3) frame
determinism-checking as decidable formulas for an SMT solver. Rehearsal then
leverages the determinacy analysis to check other important properties, such as
idempotency. Finally, we apply Rehearsal to several real-world Puppet
configurations.Comment: In proceedings of ACM SIGPLAN Conference on Programming Language
Design and Implementation (PLDI) 201
Scalable Certified Segmentation via Randomized Smoothing
We present a new certification method for image and point cloud segmentation
based on randomized smoothing. The method leverages a novel scalable algorithm
for prediction and certification that correctly accounts for multiple testing,
necessary for ensuring statistical guarantees. The key to our approach is
reliance on established multiple-testing correction mechanisms as well as the
ability to abstain from classifying single pixels or points while still
robustly segmenting the overall input. Our experimental evaluation on synthetic
data and challenging datasets, such as Pascal Context, Cityscapes, and
ShapeNet, shows that our algorithm can achieve, for the first time, competitive
accuracy and certification guarantees on real-world segmentation tasks. We
provide an implementation at https://github.com/eth-sri/segmentation-smoothing.Comment: ICML'2
Programmable Synthetic Tabular Data Generation
Large amounts of tabular data remain underutilized due to privacy, data
quality, and data sharing limitations. While training a generative model
producing synthetic data resembling the original distribution addresses some of
these issues, most applications require additional constraints from the
generated data. Existing synthetic data approaches are limited as they
typically only handle specific constraints, e.g., differential privacy (DP) or
increased fairness, and lack an accessible interface for declaring general
specifications. In this work, we introduce ProgSyn, the first programmable
synthetic tabular data generation algorithm that allows for comprehensive
customization over the generated data. To ensure high data quality while
adhering to custom specifications, ProgSyn pre-trains a generative model on the
original dataset and fine-tunes it on a differentiable loss automatically
derived from the provided specifications. These can be programmatically
declared using statistical and logical expressions, supporting a wide range of
requirements (e.g., DP or fairness, among others). We conduct an extensive
experimental evaluation of ProgSyn on a number of constraints, achieving a new
state-of-the-art on some, while remaining general. For instance, at the same
fairness level we achieve 2.3% higher downstream accuracy than the
state-of-the-art in fair synthetic data generation on the Adult dataset.
Overall, ProgSyn provides a versatile and accessible framework for generating
constrained synthetic tabular data, allowing for specifications that generalize
beyond the capabilities of prior work
Watermark Stealing in Large Language Models
LLM watermarking has attracted attention as a promising way to detect
AI-generated content, with some works suggesting that current schemes may
already be fit for deployment. In this work we dispute this claim, identifying
watermark stealing (WS) as a fundamental vulnerability of these schemes. We
show that querying the API of the watermarked LLM to approximately
reverse-engineer a watermark enables practical spoofing attacks, as
hypothesized in prior work, but also greatly boosts scrubbing attacks, which
was previously unnoticed. We are the first to propose an automated WS algorithm
and use it in the first comprehensive study of spoofing and scrubbing in
realistic settings. We show that for under $50 an attacker can both spoof and
scrub state-of-the-art schemes previously considered safe, with average success
rate of over 80%. Our findings challenge common beliefs about LLM watermarking,
stressing the need for more robust schemes. We make all our code and additional
examples available at https://watermark-stealing.org.Comment: ICML 202
- …
