339 research outputs found

    DP-HyPO: An Adaptive Private Hyperparameter Optimization Framework

    Full text link
    Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize "adaptive" hyperparameter optimization methods such as Gaussian process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for "adaptive" private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world and synthetic datasets

    Privacy-preserving Inference of Group Mean Difference in Zero-inflated Right Skewed Data with Partitioning and Censoring

    Full text link
    We examine privacy-preserving inferences of group mean differences in zero-inflated right-skewed (zirs) data. Zero inflation and right skewness are typical characteristics of ads clicks and purchases data collected from e-commerce and social media platforms, where we also want to preserve user privacy to ensure that individual data is protected. In this work, we develop likelihood-based and model-free approaches to analyzing zirs data with formal privacy guarantees. We first apply partitioning and censoring (PAC) to ``regularize'' zirs data to get the PAC data. We expect inferences based on PAC to have better inferential properties and more robust privacy considerations compared to analyzing the raw data directly. We conduct theoretical analysis to establish the MSE consistency of the privacy-preserving estimators from the proposed approaches based on the PAC data and examine the rate of convergence in the number of partitions and privacy loss parameters. The theoretical results also suggest that it is the sampling error of PAC data rather than the sanitization error that is the limiting factor in the convergence rate. We conduct extensive simulation studies to compare the inferential utility of the proposed approach for different types of zirs data, sample size and partition size combinations, censoring scenarios, mean differences, privacy budgets, and privacy loss composition schemes. We also apply the methods to obtain privacy-preserving inference for the group mean difference in a real digital ads click-through data set. Based on the theoretical and empirical results, we make recommendations regarding the usage of these methods in practice

    Federated Linear Contextual Bandits with User-level Differential Privacy

    Full text link
    This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as \robin and show that it is near-optimal in terms of the number of clients MM and the privacy budget ε\varepsilon by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level (ε,δ)(\varepsilon,\delta)-LDP must suffer a regret blow-up factor at least {min{1/ε,M}\min\{1/\varepsilon,M\} or min{1/ε,M}\min\{1/\sqrt{\varepsilon},\sqrt{M}\}} under different conditions.Comment: Accepted by ICML 202

    Asymmetric versus symmetric HgTe/CdxHg1xTe\rm{HgTe/Cd_{x}Hg_{1-x}Te} double quantum wells: Band gap tuning without electric field

    Full text link
    We investigate the electron states in double asymmetric HgTe/CdxHg1xTe\rm{HgTe/Cd_{x}Hg_{1-x}Te} quantum wells grown along the [001][001] direction. The subbands are computed by means of the envelope function approximation applied to the 8-band Kane kp\bf{k}\cdot\bf{p} model. The asymmetry of the confining potential of the double quantum wells results in a gap opening which is absent in the symmetric system where it can only be induced by an applied electric field. The band gap and the subbands are affected by spin-orbit coupling which is a consequence of the asymmetry of the confining potential. The electron-like and hole-like states are mainly confined in different quantum wells, and the enhanced hybridization between them opens a spin-dependent hybridization gap at a finite in-plane wavevector. We show that both the ratio of the widths of the two quantum wells and the mole fraction of the CdxHg1xTe\rm{Cd_{x}Hg_{1-x}Te} barrier control both the energy gap between the hole-like states and the hybridization gap. The energy subbands are shown to exhibit inverted ordering, and therefore a nontrivial topological phase could emerge in the system.Comment: 16 pages, 5 figures, The following article has been accepted by Journal of Applied Physics; After it is published, it will be found at https://doi.org/10.1063/5.001606

    Practical, Label Private Deep Learning Training based on Secure Multiparty Computation and Differential Privacy

    Get PDF
    Secure Multiparty Computation (MPC) is an invaluable tool for training machine learning models when the training data cannot be directly accessed by the model trainer. Unfortunately, complex algorithms, such as deep learning models, have their computational complexities increased by orders of magnitude when performed using MPC protocols. In this contribution, we study how to efficiently train an important class of machine learning problems by using MPC where features are known by one of the computing parties and only the labels are private. We propose new protocols combining differential privacy (DP) and MPC in order to privately and efficiently train a deep learning model in such scenario. More specifically, we release differentially private information during the MPC computation to dramatically reduce the training time. All released information does not compromise the privacy of the labels at the individual level. Our protocols can have running times that are orders of magnitude better than a straightforward use of MPC at a moderate cost in model accuracy

    RefineNet: multi-path refinement networks for dense prediction

    Get PDF
    Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense prediction problems such as semantic segmentation and depth estimation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments on semantic segmentation which is a dense classification problem and achieve good performance on seven public datasets. We further apply our method for depth estimation and demonstrate the effectiveness of our method on dense regression problems.Guosheng Lin, Fayao Liu, Anton Milan, Chunhua Shen, and Ian Rei

    Impact of Genetic Ancestry on Outcomes in ECOG-ACRIN-E5103

    Get PDF
    Purpose: Racial disparity in breast cancer outcomes exists between African American and Caucasian women in the United States. We have evaluated the impact of genetically determined ancestry on disparity in efficacy and therapy-induced toxicity for breast cancer patients in the context of a randomized, phase III adjuvant trial. Patients and Methods: This study compared outcomes between 386 patients of African ancestry (AA) and 2473 patients of European ancestry (EA) in a randomized, phase III breast cancer trial; ECOG-ACRIN-E5103. The primary efficacy endpoint, invasive disease free survival (DFS) and clinically significant toxicities were compared including: anthracycline-induced congestive heart failure (CHF), taxane-induced peripheral neuropathy (TIPN), and bevacizumab-induced hypertension. Results: Overall, AAs had significantly inferior DFS (p=0.002; HR=1.5) compared with EAs. This was significant in the estrogen receptor-positive subgroup (p=0.03); with a similar, non-significant trend for those who had triple negative breast cancer (TNBC; p=0.12). AAs also had significantly more grade 3-4 TIPN (OR=2.9; p=2.4 ×10-11) and grade 3-4 bevacizumab-induced hypertension (OR=1.6; p=0.02), with a trend for more CHF (OR=1.8; p=0.08). AAs had significantly more dose reductions for paclitaxel (p=6.6 ×10-6). In AAs, dose reductions in paclitaxel had a significant negative impact on DFS (p=0.03); whereas in EAs, dose reductions did not impact outcome (p=0.35). Conclusion: AAs had inferior DFS with more clinically important toxicities in ECOG-ACRIN-E5103. The altered risk to benefit ratio for adjuvant breast cancer chemotherapy should lead to additional research with the focus centered on the impact of genetic ancestry on both efficacy and toxicity. Strategies to minimize dose reductions for paclitaxel, especially due to TIPN, are warranted for this population

    Charcot-Marie-Tooth gene, SBF2, associated with taxaneinduced peripheral neuropathy in African Americans

    Get PDF
    PURPOSE: Taxane-induced peripheral neuropathy (TIPN) is one of the most important survivorship issues for cancer patients. African Americans (AA) have previously been shown to have an increased risk for this toxicity. Germline predictive biomarkers were evaluated to help identify a priori which patients might be at extraordinarily high risk for this toxicity. EXPERIMENTAL DESIGN: Whole exome sequencing was performed using germline DNA from 213 AA patients who received a standard dose and schedule of paclitaxel in the adjuvant, randomized phase III breast cancer trial, E5103. Cases were defined as those with either grade 3-4 (n=64) or grade 2-4 (n=151) TIPN and were compared to controls (n=62) that were not reported to have experienced TIPN. We retained for analysis rare variants with a minor allele frequency <3% and which were predicted to be deleterious by protein prediction programs. A gene-based, case-control analysis using SKAT was performed to identify genes that harbored an imbalance of deleterious variants associated with increased risk of TIPN. RESULTS: Five genes had a p-value < 10-4 for grade 3-4 TIPN analysis and three genes had a p-value < 10-4 for the grade 2-4 TIPN analysis. For the grade 3-4 TIPN analysis, SET binding factor 2 (SBF2) was significantly associated with TIPN (p-value=4.35 x10-6). Five variants were predicted to be deleterious in SBF2. Inherited mutations in SBF2 have previously been associated with autosomal recessive, Type 4B2 Charcot-Marie-Tooth (CMT) disease. CONCLUSION: Rare variants in SBF2, a CMT gene, predict an increased risk of TIPN in AA patients receiving paclitaxel

    The Gaseous Environment of High-z Galaxies: Precision Measurements of Neutral Hydrogen in the Circumgalactic Medium of z ~ 2-3 Galaxies in the Keck Baryonic Structure Survey

    Full text link
    We present results from the Keck Baryonic Structure Survey (KBSS), a unique spectroscopic survey designed to explore the connection between galaxies and intergalactic baryons. The KBSS is optimized for the redshift range z ~ 2-3, combining S/N ~ 100 Keck/HIRES spectra of 15 hyperluminous QSOs with densely sampled galaxy redshift surveys surrounding each QSO sightline. We perform Voigt profile decomposition of all 6000 HI absorbers within the full Lya forest in the QSO spectra. Here we present the distribution, column density, kinematics, and absorber line widths of HI surrounding 886 star-forming galaxies with 2.0 < z < 2.8 and within 3 Mpc of a QSO sightline. We find that N_HI and the multiplicity of HI components increase rapidly near galaxies. The strongest HI absorbers within ~ 100 physical kpc of galaxies have N_HI ~ 3 dex higher than those near random locations in the IGM. The circumgalactic zone of most enhanced HI absorption (CGM) is found within 300 kpc and 300 km/s of galaxies. Nearly half of absorbers with log(N_HI) > 15.5 are found within the CGM of galaxies meeting our photometric selection, while their CGM occupy only 1.5% of the cosmic volume. The spatial covering fraction, multiplicity of absorption components, and characteristic N_HI remain elevated to transverse distances of 2 physical Mpc. Absorbers with log(N_HI) > 14.5 are tightly correlated with the positions of galaxies, while absorbers with lower N_HI are correlated only on Mpc scales. Redshift anisotropies on Mpc scales indicate coherent infall toward galaxies, while on scales of ~100 physical kpc peculiar velocities of 260 km/s are indicated. The median Doppler widths of absorbers within 1-3 virial radii of galaxies are ~50% larger than randomly chosen absorbers of the same N_HI, suggesting higher gas temperatures and/or increased turbulence likely caused by accretion shocks and/or galactic winds.Comment: Accepted to Ap
    corecore