234 research outputs found
Extremal Mechanisms for Local Differential Privacy
Local differential privacy has recently surfaced as a strong measure of
privacy in contexts where personal information remains private even from data
analysts. Working in a setting where both the data providers and data analysts
want to maximize the utility of statistical analyses performed on the released
data, we study the fundamental trade-off between local differential privacy and
utility. This trade-off is formulated as a constrained optimization problem:
maximize utility subject to local differential privacy constraints. We
introduce a combinatorial family of extremal privatization mechanisms, which we
call staircase mechanisms, and show that it contains the optimal privatization
mechanisms for a broad class of information theoretic utilities such as mutual
information and -divergences. We further prove that for any utility function
and any privacy level, solving the privacy-utility maximization problem is
equivalent to solving a finite-dimensional linear program, the outcome of which
is the optimal staircase mechanism. However, solving this linear program can be
computationally expensive since it has a number of variables that is
exponential in the size of the alphabet the data lives in. To account for this,
we show that two simple privatization mechanisms, the binary and randomized
response mechanisms, are universally optimal in the low and high privacy
regimes, and well approximate the intermediate regime.Comment: 52 pages, 10 figures in JMLR 201
DP-Image: Differential Privacy for Image Data in Feature Space
The excessive use of images in social networks, government databases, and
industrial applications has posed great privacy risks and raised serious
concerns from the public. Even though differential privacy (DP) is a widely
accepted criterion that can provide a provable privacy guarantee, the
application of DP on unstructured data such as images is not trivial due to the
lack of a clear qualification on the meaningful difference between any two
images. In this paper, for the first time, we introduce a novel notion of
image-aware differential privacy, referred to as DP-image, that can protect
user's personal information in images, from both human and AI adversaries. The
DP-Image definition is formulated as an extended version of traditional
differential privacy, considering the distance measurements between feature
space vectors of images. Then we propose a mechanism to achieve DP-Image by
adding noise to an image feature vector. Finally, we conduct experiments with a
case study on face image privacy. Our results show that the proposed DP-Image
method provides excellent DP protection on images, with a controllable
distortion to faces
The Limits of Post-Selection Generalization
While statistics and machine learning offers numerous methods for ensuring
generalization, these methods often fail in the presence of adaptivity---the
common practice in which the choice of analysis depends on previous
interactions with the same dataset. A recent line of work has introduced
powerful, general purpose algorithms that ensure post hoc generalization (also
called robust or post-selection generalization), which says that, given the
output of the algorithm, it is hard to find any statistic for which the data
differs significantly from the population it came from.
In this work we show several limitations on the power of algorithms
satisfying post hoc generalization. First, we show a tight lower bound on the
error of any algorithm that satisfies post hoc generalization and answers
adaptively chosen statistical queries, showing a strong barrier to progress in
post selection data analysis. Second, we show that post hoc generalization is
not closed under composition, despite many examples of such algorithms
exhibiting strong composition properties
Information-theoretic limitations of distributed information processing
In a generic distributed information processing system, a number of agents connected by communication channels aim to accomplish a task collectively through local communications. The fundamental limits of distributed information processing problems depend not only on the intrinsic difficulty of the task, but also on the communication constraints due to the distributedness. In this thesis, we reveal these dependencies quantitatively under information-theoretic frameworks.
We consider three typical distributed information processing problems: decentralized parameter estimation, distributed function computation, and statistical learning under adaptive composition. For the first two problems, we derive converse results on the Bayes risk and the computation time, respectively. For the last problem, we first study the relationship between the generalization capability of a learning algorithm and its stability property measured by the mutual information between its input and output, and then derive achievability results on the generalization error of adaptively composed learning algorithms. In all cases, we obtain general results on the fundamental limits with respect to a general model of the problem, so that the results can be applied to various specific scenarios. Our information-theoretic analyses also provide general approaches to inferring global properties of a distributed information processing system from local properties of its components
Mitigating Group Bias in Federated Learning for Heterogeneous Devices
Federated Learning is emerging as a privacy-preserving model training
approach in distributed edge applications. As such, most edge deployments are
heterogeneous in nature i.e., their sensing capabilities and environments vary
across deployments. This edge heterogeneity violates the independence and
identical distribution (IID) property of local data across clients and produces
biased global models i.e. models that contribute to unfair decision-making and
discrimination against a particular community or a group. Existing bias
mitigation techniques only focus on bias generated from label heterogeneity in
non-IID data without accounting for domain variations due to feature
heterogeneity and do not address global group-fairness property.
Our work proposes a group-fair FL framework that minimizes group-bias while
preserving privacy and without resource utilization overhead. Our main idea is
to leverage average conditional probabilities to compute a cross-domain group
\textit{importance weights} derived from heterogeneous training data to
optimize the performance of the worst-performing group using a modified
multiplicative weights update method. Additionally, we propose regularization
techniques to minimize the difference between the worst and best-performing
groups while making sure through our thresholding mechanism to strike a balance
between bias reduction and group performance degradation. Our evaluation of
human emotion recognition and image classification benchmarks assesses the fair
decision-making of our framework in real-world heterogeneous settings
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
Data-free quantization can potentially address data privacy and security
concerns in model compression, and thus has been widely investigated. Recently,
PSAQ-ViT designs a relative value metric, patch similarity, to generate data
from pre-trained vision transformers (ViTs), achieving the first attempt at
data-free quantization for ViTs. In this paper, we propose PSAQ-ViT V2, a more
accurate and general data-free quantization framework for ViTs, built on top of
PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT,
we introduce an adaptive teacher-student strategy, which facilitates the
constant cyclic evolution of the generated samples and the quantized model
(student) in a competitive and interactive fashion under the supervision of the
full-precision model (teacher), thus significantly improving the accuracy of
the quantized model. Moreover, without the auxiliary category guidance, we
employ the task- and model-independent prior information, making the
general-purpose scheme compatible with a broad range of vision tasks and
models. Extensive experiments are conducted on various models on image
classification, object detection, and semantic segmentation tasks, and PSAQ-ViT
V2, with the naive quantization strategy and without access to real-world data,
consistently achieves competitive results, showing potential as a powerful
baseline on data-free quantization for ViTs. For instance, with Swin-S as the
(backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet,
50.9 box AP and 44.1 mask AP on COCO, and 47.2 mIoU on ADE20K. We hope that
accurate and general PSAQ-ViT V2 can serve as a potential and practice solution
in real-world applications involving sensitive data. Code is released and
merged at: https://github.com/zkkli/PSAQ-ViT.Comment: Accepted by TNNLS 202
- …