6,514 research outputs found
Data Leakage in Tabular Federated Learning
While federated learning (FL) promises to preserve privacy in distributed
training of deep learning models, recent work in the image and NLP domains
showed that training updates leak private data of participating clients. At the
same time, most high-stakes applications of FL (e.g., legal and financial) use
tabular data. Compared to the NLP and image domains, reconstruction of tabular
data poses several unique challenges: (i) categorical features introduce a
significantly more difficult mixed discrete-continuous optimization problem,
(ii) the mix of categorical and continuous features causes high variance in the
final reconstructions, and (iii) structured data makes it difficult for the
adversary to judge reconstruction quality. In this work, we tackle these
challenges and propose the first comprehensive reconstruction attack on tabular
data, called TabLeak. TabLeak is based on three key ingredients: (i) a softmax
structural prior, implicitly converting the mixed discrete-continuous
optimization problem into an easier fully continuous one, (ii) a way to reduce
the variance of our reconstructions through a pooled ensembling scheme
exploiting the structure of tabular data, and (iii) an entropy measure which
can successfully assess reconstruction quality. Our experimental evaluation
demonstrates the effectiveness of TabLeak, reaching a state-of-the-art on four
popular tabular datasets. For instance, on the Adult dataset, we improve attack
accuracy by 10% compared to the baseline on the practically relevant batch size
of 32 and further obtain non-trivial reconstructions for batch sizes as large
as 128. Our findings are important as they show that performing FL on tabular
data, which often poses high privacy risks, is highly vulnerable
Scalable Inference of Symbolic Adversarial Examples
We present a novel method for generating symbolic adversarial examples: input
regions guaranteed to only contain adversarial examples for the given neural
network. These regions can generate real-world adversarial examples as they
summarize trillions of adversarial examples.
We theoretically show that computing optimal symbolic adversarial examples is
computationally expensive. We present a method for approximating optimal
examples in a scalable manner. Our method first selectively uses adversarial
attacks to generate a candidate region and then prunes this region with
hyperplanes that fit points obtained via specialized sampling. It iterates
until arriving at a symbolic adversarial example for which it can prove, via
state-of-the-art convex relaxation techniques, that the region only contains
adversarial examples. Our experimental results demonstrate that our method is
practically effective: it only needs a few thousand attacks to infer symbolic
summaries guaranteed to contain adversarial examples
Data Leakage in Federated Averaging
Recent attacks have shown that user data can be recovered from FedSGD
updates, thus breaking privacy. However, these attacks are of limited practical
relevance as federated learning typically uses the FedAvg algorithm. Compared
to FedSGD, recovering data from FedAvg updates is much harder as: (i) the
updates are computed at unobserved intermediate network weights, (ii) a large
number of batches are used, and (iii) labels and network weights vary
simultaneously across client steps. In this work, we propose a new
optimization-based attack which successfully attacks FedAvg by addressing the
above challenges. First, we solve the optimization problem using automatic
differentiation that forces a simulation of the client's update that generates
the unobserved parameters for the recovered labels and inputs to match the
received client update. Second, we address the large number of batches by
relating images from different epochs with a permutation invariant prior.
Third, we recover the labels by estimating the parameters of existing FedSGD
attacks at every FedAvg step. On the popular FEMNIST dataset, we demonstrate
that on average we successfully recover >45% of the client's images from
realistic FedAvg updates computed on 10 local epochs of 10 batches each with 5
images, compared to only <10% using the baseline. Our findings show many
real-world federated learning implementations based on FedAvg are vulnerable
FARE: Provably Fair Representation Learning with Practical Certificates
Fair representation learning (FRL) is a popular class of methods aiming to
produce fair classifiers via data preprocessing. Recent regulatory directives
stress the need for FRL methods that provide practical certificates, i.e.,
provable upper bounds on the unfairness of any downstream classifier trained on
preprocessed data, which directly provides assurance in a practical scenario.
Creating such FRL methods is an important challenge that remains unsolved. In
this work, we address that challenge and introduce FARE (Fairness with
Restricted Encoders), the first FRL method with practical fairness
certificates. FARE is based on our key insight that restricting the
representation space of the encoder enables the derivation of practical
guarantees, while still permitting favorable accuracy-fairness tradeoffs for
suitable instantiations, such as one we propose based on fair trees. To produce
a practical certificate, we develop and apply a statistical procedure that
computes a finite sample high-confidence upper bound on the unfairness of any
downstream classifier trained on FARE embeddings. In our comprehensive
experimental evaluation, we demonstrate that FARE produces practical
certificates that are tight and often even comparable with purely empirical
results obtained by prior methods, which establishes the practical value of our
approach.Comment: ICML 202
RSC remodeling of oligo-nucleosomes: an atomic force microscopy study
RSC is an essential chromatin remodeling factor that is required for the
control of several processes including transcription, repair and replication.
The ability of RSC to relocate centrally positioned mononucleosomes at the end
of nucleosomal DNA is firmly established, but the data on RSC action on
oligo-nucleosomal templates remains still scarce. By using Atomic Force
Microscopy (AFM) imaging, we have quantitatively studied the RSC- induced
mobilization of positioned di- and trinucleosomes as well as the directionality
of mobilization on mononucleosomal template labeled at one end with
streptavidin. AFM imaging showed only a limited set of distinct configurational
states for the remodeling products. No stepwise or preferred directionality of
the nucleosome motion was observed. Analysis of the corresponding reaction
pathways allows deciphering the mechanistic features of RSC-induced nucleosome
relocation. The final outcome of RSC remodeling of oligosome templates is the
packing of the nucleosomes at the edge of the template, providing large
stretches of DNA depleted of nucleosomes. This feature of RSC may be used by
the cell to overcome the barrier imposed by the presence of nucleosomes
Modelling the water budget and the riverflows of the Maritsa basin in Bulgaria
International audienceA soil-vegetation-atmosphere transfer model coupled with a macroscale distributed hydrological model was used to simulate the water cycle for a large region in Bulgaria. To do so, an atmospheric forcing was built for two hydrological years (1 October 1995 to 30 September 1997), at an eight km resolution. The impact of the human activities on the rivers (especially hydropower or irrigation) was taken into account. An improvement of the hydrometeorological model was made: for better simulation of summer riverflow, two additional reservoirs were added to simulate the slow component of the runoff. Those reservoirs were calibrated using the observed data of the 1st year, while the 2nd year was used for validation. 56 hydrologic stations and 12 dams were used for the model calibration while 41 river gauges were used for the validation of the model. The results compare well with the daily-observed discharges, with good results obtained over more than 25% of the river gauges. The simulated snow depth was compared to daily measurements at 174 stations and the evolution of the snow water equivalent was validated at 5 sites. The process of melting and refreezing of snow was found to be important in this region. The comparison of the normalized values of simulated versus measured soil moisture showed good correlation. The surface water budget shows large spatial variations due to the elevation influence on the precipitation, soil properties and vegetation variability. An inter-annual difference was observed in the water cycle as the first year was more influenced by Mediterranean climate, while the second year was characterised by continental influence. The energy budget shows a dominating sensible heat component in summer, due to the fact that the water stress limits the evaporation. This study is a first step for the implementation of an operational hydrometeorological model that could be used for real time monitoring and forecasting of water budget components and river flow in Bulgaria
- âŠ