31 research outputs found
A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms"
Data valuation is a growing research field that studies the influence of
individual data points for machine learning (ML) models. Data Shapley, inspired
by cooperative game theory and economics, is an effective method for data
valuation. However, it is well-known that the Shapley value (SV) can be
computationally expensive. Fortunately, Jia et al. (2019) showed that for
K-Nearest Neighbors (KNN) models, the computation of Data Shapley is
surprisingly simple and efficient.
In this note, we revisit the work of Jia et al. (2019) and propose a more
natural and interpretable utility function that better reflects the performance
of KNN models. We derive the corresponding calculation procedure for the Data
Shapley of KNN classifiers/regressors with the new utility functions. Our new
approach, dubbed soft-label KNN-SV, achieves the same time complexity as the
original method. We further provide an efficient approximation algorithm for
soft-label KNN-SV based on locality sensitive hashing (LSH). Our experimental
results demonstrate that Soft-label KNN-SV outperforms the original method on
most datasets in the task of mislabeled data detection, making it a better
baseline for future work on data valuation
Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning
Propose-Test-Release (PTR) is a differential privacy framework that works
with local sensitivity of functions, instead of their global sensitivity. This
framework is typically used for releasing robust statistics such as median or
trimmed mean in a differentially private manner. While PTR is a common
framework introduced over a decade ago, using it in applications such as robust
SGD where we need many adaptive robust queries is challenging. This is mainly
due to the lack of Renyi Differential Privacy (RDP) analysis, an essential
ingredient underlying the moments accountant approach for differentially
private deep learning. In this work, we generalize the standard PTR and derive
the first RDP bound for it when the target function has bounded global
sensitivity. We show that our RDP bound for PTR yields tighter DP guarantees
than the directly analyzed (\eps, \delta)-DP. We also derive the
algorithm-specific privacy amplification bound of PTR under subsampling. We
show that our bound is much tighter than the general upper bound and close to
the lower bound. Our RDP bounds enable tighter privacy loss calculation for the
composition of many adaptive runs of PTR. As an application of our analysis, we
show that PTR and our theoretical results can be used to design differentially
private variants for byzantine robust training algorithms that use robust
statistics for gradients aggregation. We conduct experiments on the settings of
label, feature, and gradient corruption across different datasets and
architectures. We show that PTR-based private and robust training algorithm
significantly improves the utility compared with the baseline.Comment: NeurIPS 202
Optimizing active surveillance strategies to balance the competing goals of early detection of grade progression and minimizing harm from biopsies
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/142555/1/cncr31101.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142555/2/cncr31101_am.pd
Comparative Study on the Early Stage of Skid Resistance Development between Polyurethane-Bound Porous Mixture and Asphalt Mixture
Polyurethane-bound porous mixture (PPM) is a new type of pavement material that has shown some potential for overcoming common asphalt mixtures mechanical failures. However, little research has been done on its skid resistance performance. This work presents a comparative study of the skid resistance development between PPM and asphalt mixtures at their early stage. In this study, the three mixtures were bonded by three type binders. The three type binders were polyurethane, 70# virgin bitumen, and styrene-butadiene-styrene (SBS) modified asphalt. In order to distinguished the three type mixtures, we named them PPM, BAM, and SAM respectively. A Taber abraser was used to test the polishing property of binders. A third-scale model mobile loading simulator (MMLS3) was used to simulate the traffic loadings on mixtures, and a British pendulum tester was used to measure the skid resistance of the three types of mixtures in the loading process. The binder polishing test results show a good linear relationship between the binder's mass loss and the polishing cycle. The slope of the fitting line of the two parameters was defined as binder coefficient (BC) to characterize the polishing property of the binder. The mixture test results show that the skid resistance development trend of three mixtures is similar, as it first increases, then decreases, then finally flattens. However, the British pendulum number peak value and stable value of PPM are lower than that of SAM. The order of the number of loading times of peak (NLTP) of the three mixtures is SAM>PPM>BAM. Another good linear relationship is found between BC and NLTP, and the R2 of the fitting model is 0.85, which indicates that the polishing property of binder is effective for predicting the moment of occurrence of the mixture skid resistance peak.Accepted Author ManuscriptUrban Studie
Fast growth of inch-sized single-crystalline graphene from a controlled single nucleus on Cu-Ni alloys
Wafer-scale single-crystalline graphene monolayers are highly sought after as an ideal platform for electronic and other applications(1-3). At present, state-of-the-art growth methods based on chemical vapour deposition allow the synthesis of one-centimetre-sized single-crystalline graphene domains in similar to 12 h, by suppressing nucleation events on the growth substrate(4). Here we demonstrate an efficient strategy for achieving large-area single-crystalline graphene by letting a single nucleus evolve into a monolayer at a fast rate. By locally feeding carbon precursors to a desired position of a substrate composed of an optimized Cu-Ni alloy, we synthesized an similar to 1.5-inch-large graphene monolayer in 2.5 h. Localized feeding induces the formation of a single nucleus on the entire substrate, and the optimized alloy activates an isothermal segregation mechanism that greatly expedites the growth rate(5,6). This approach may also prove effective for the synthesis of wafer-scale single-crystalline monolayers of other two-dimensional materials.ope