6,396 research outputs found

    Trade liberalization: the case of Bulgaria

    Get PDF

    Data Leakage in Tabular Federated Learning

    Full text link
    While federated learning (FL) promises to preserve privacy in distributed training of deep learning models, recent work in the image and NLP domains showed that training updates leak private data of participating clients. At the same time, most high-stakes applications of FL (e.g., legal and financial) use tabular data. Compared to the NLP and image domains, reconstruction of tabular data poses several unique challenges: (i) categorical features introduce a significantly more difficult mixed discrete-continuous optimization problem, (ii) the mix of categorical and continuous features causes high variance in the final reconstructions, and (iii) structured data makes it difficult for the adversary to judge reconstruction quality. In this work, we tackle these challenges and propose the first comprehensive reconstruction attack on tabular data, called TabLeak. TabLeak is based on three key ingredients: (i) a softmax structural prior, implicitly converting the mixed discrete-continuous optimization problem into an easier fully continuous one, (ii) a way to reduce the variance of our reconstructions through a pooled ensembling scheme exploiting the structure of tabular data, and (iii) an entropy measure which can successfully assess reconstruction quality. Our experimental evaluation demonstrates the effectiveness of TabLeak, reaching a state-of-the-art on four popular tabular datasets. For instance, on the Adult dataset, we improve attack accuracy by 10% compared to the baseline on the practically relevant batch size of 32 and further obtain non-trivial reconstructions for batch sizes as large as 128. Our findings are important as they show that performing FL on tabular data, which often poses high privacy risks, is highly vulnerable

    Scalable Inference of Symbolic Adversarial Examples

    Full text link
    We present a novel method for generating symbolic adversarial examples: input regions guaranteed to only contain adversarial examples for the given neural network. These regions can generate real-world adversarial examples as they summarize trillions of adversarial examples. We theoretically show that computing optimal symbolic adversarial examples is computationally expensive. We present a method for approximating optimal examples in a scalable manner. Our method first selectively uses adversarial attacks to generate a candidate region and then prunes this region with hyperplanes that fit points obtained via specialized sampling. It iterates until arriving at a symbolic adversarial example for which it can prove, via state-of-the-art convex relaxation techniques, that the region only contains adversarial examples. Our experimental results demonstrate that our method is practically effective: it only needs a few thousand attacks to infer symbolic summaries guaranteed to contain ≈10258\approx 10^{258} adversarial examples

    Data Leakage in Federated Averaging

    Full text link
    Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches are used, and (iii) labels and network weights vary simultaneously across client steps. In this work, we propose a new optimization-based attack which successfully attacks FedAvg by addressing the above challenges. First, we solve the optimization problem using automatic differentiation that forces a simulation of the client's update that generates the unobserved parameters for the recovered labels and inputs to match the received client update. Second, we address the large number of batches by relating images from different epochs with a permutation invariant prior. Third, we recover the labels by estimating the parameters of existing FedSGD attacks at every FedAvg step. On the popular FEMNIST dataset, we demonstrate that on average we successfully recover >45% of the client's images from realistic FedAvg updates computed on 10 local epochs of 10 batches each with 5 images, compared to only <10% using the baseline. Our findings show many real-world federated learning implementations based on FedAvg are vulnerable

    FARE: Provably Fair Representation Learning with Practical Certificates

    Full text link
    Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. Recent regulatory directives stress the need for FRL methods that provide practical certificates, i.e., provable upper bounds on the unfairness of any downstream classifier trained on preprocessed data, which directly provides assurance in a practical scenario. Creating such FRL methods is an important challenge that remains unsolved. In this work, we address that challenge and introduce FARE (Fairness with Restricted Encoders), the first FRL method with practical fairness certificates. FARE is based on our key insight that restricting the representation space of the encoder enables the derivation of practical guarantees, while still permitting favorable accuracy-fairness tradeoffs for suitable instantiations, such as one we propose based on fair trees. To produce a practical certificate, we develop and apply a statistical procedure that computes a finite sample high-confidence upper bound on the unfairness of any downstream classifier trained on FARE embeddings. In our comprehensive experimental evaluation, we demonstrate that FARE produces practical certificates that are tight and often even comparable with purely empirical results obtained by prior methods, which establishes the practical value of our approach.Comment: ICML 202

    RSC remodeling of oligo-nucleosomes: an atomic force microscopy study

    Get PDF
    RSC is an essential chromatin remodeling factor that is required for the control of several processes including transcription, repair and replication. The ability of RSC to relocate centrally positioned mononucleosomes at the end of nucleosomal DNA is firmly established, but the data on RSC action on oligo-nucleosomal templates remains still scarce. By using Atomic Force Microscopy (AFM) imaging, we have quantitatively studied the RSC- induced mobilization of positioned di- and trinucleosomes as well as the directionality of mobilization on mononucleosomal template labeled at one end with streptavidin. AFM imaging showed only a limited set of distinct configurational states for the remodeling products. No stepwise or preferred directionality of the nucleosome motion was observed. Analysis of the corresponding reaction pathways allows deciphering the mechanistic features of RSC-induced nucleosome relocation. The final outcome of RSC remodeling of oligosome templates is the packing of the nucleosomes at the edge of the template, providing large stretches of DNA depleted of nucleosomes. This feature of RSC may be used by the cell to overcome the barrier imposed by the presence of nucleosomes

    Modelling the water budget and the riverflows of the Maritsa basin in Bulgaria

    Get PDF
    International audienceA soil-vegetation-atmosphere transfer model coupled with a macroscale distributed hydrological model was used to simulate the water cycle for a large region in Bulgaria. To do so, an atmospheric forcing was built for two hydrological years (1 October 1995 to 30 September 1997), at an eight km resolution. The impact of the human activities on the rivers (especially hydropower or irrigation) was taken into account. An improvement of the hydrometeorological model was made: for better simulation of summer riverflow, two additional reservoirs were added to simulate the slow component of the runoff. Those reservoirs were calibrated using the observed data of the 1st year, while the 2nd year was used for validation. 56 hydrologic stations and 12 dams were used for the model calibration while 41 river gauges were used for the validation of the model. The results compare well with the daily-observed discharges, with good results obtained over more than 25% of the river gauges. The simulated snow depth was compared to daily measurements at 174 stations and the evolution of the snow water equivalent was validated at 5 sites. The process of melting and refreezing of snow was found to be important in this region. The comparison of the normalized values of simulated versus measured soil moisture showed good correlation. The surface water budget shows large spatial variations due to the elevation influence on the precipitation, soil properties and vegetation variability. An inter-annual difference was observed in the water cycle as the first year was more influenced by Mediterranean climate, while the second year was characterised by continental influence. The energy budget shows a dominating sensible heat component in summer, due to the fact that the water stress limits the evaporation. This study is a first step for the implementation of an operational hydrometeorological model that could be used for real time monitoring and forecasting of water budget components and river flow in Bulgaria
    • 

    corecore