615 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting Codes
Fingerprinting arguments, first introduced by Bun, Ullman, and Vadhan (STOC
2014), are the most widely used method for establishing lower bounds on the
sample complexity or error of approximately differentially private (DP)
algorithms. Still, there are many problems in differential privacy for which we
don't know suitable lower bounds, and even for problems that we do, the lower
bounds are not smooth, and usually become vacuous when the error is larger than
some threshold.
In this work, we present a simple method to generate hard instances by
applying a padding-and-permuting transformation to a fingerprinting code. We
illustrate the applicability of this method by providing new lower bounds in
various settings:
1. A tight lower bound for DP averaging in the low-accuracy regime, which in
particular implies a new lower bound for the private 1-cluster problem
introduced by Nissim, Stemmer, and Vadhan (PODS 2016).
2. A lower bound on the additive error of DP algorithms for approximate
k-means clustering, as a function of the multiplicative error, which is tight
for a constant multiplication error.
3. A lower bound for estimating the top singular vector of a matrix under DP
in low-accuracy regimes, which is a special case of DP subspace estimation
studied by Singhal and Steinke (NeurIPS 2021).
Our main technique is to apply a padding-and-permuting transformation to a
fingerprinting code. However, rather than proving our results using a black-box
access to an existing fingerprinting code (e.g., Tardos' code), we develop a
new fingerprinting lemma that is stronger than those of Dwork et al. (FOCS
2015) and Bun et al. (SODA 2017), and prove our lower bounds directly from the
lemma. Our lemma, in particular, gives a simpler fingerprinting code
construction with optimal rate (up to polylogarithmic factors) that is of
independent interest
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings
We propose a Neighbourhood-Aware Differential Privacy (NADP) mechanism
considering the neighbourhood of a word in a pretrained static word embedding
space to determine the minimal amount of noise required to guarantee a
specified privacy level. We first construct a nearest neighbour graph over the
words using their embeddings, and factorise it into a set of connected
components (i.e. neighbourhoods). We then separately apply different levels of
Gaussian noise to the words in each neighbourhood, determined by the set of
words in that neighbourhood. Experiments show that our proposed NADP mechanism
consistently outperforms multiple previously proposed DP mechanisms such as
Laplacian, Gaussian, and Mahalanobis in multiple downstream tasks, while
guaranteeing higher levels of privacy.Comment: Accepted to IJCNLP-AACL 202
University of Windsor Graduate Calendar 2023 Spring
https://scholar.uwindsor.ca/universitywindsorgraduatecalendars/1027/thumbnail.jp
a literature review
Fonseca, J., & Bacao, F. (2023). Tabular and latent space synthetic data generation: a literature review. Journal of Big Data, 10, 1-37. [115]. https://doi.org/10.1186/s40537-023-00792-7 --- This research was supported by two research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021 and DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC).The generation of synthetic data can be used for anonymization, regularization, oversampling, semi-supervised learning, self-supervised learning, and several other tasks. Such broad potential motivated the development of new algorithms, specialized in data generation for specific data formats and Machine Learning (ML) tasks. However, one of the most common data formats used in industrial applications, tabular data, is generally overlooked; Literature analyses are scarce, state-of-the-art methods are spread across domains or ML tasks and there is little to no distinction among the main types of mechanism underlying synthetic data generation algorithms. In this paper, we analyze tabular and latent space synthetic data generation algorithms. Specifically, we propose a unified taxonomy as an extension and generalization of previous taxonomies, review 70 generation algorithms across six ML problems, distinguish the main generation mechanisms identified into six categories, describe each type of generation mechanism, discuss metrics to evaluate the quality of synthetic data and provide recommendations for future research. We expect this study to assist researchers and practitioners identify relevant gaps in the literature and design better and more informed practices with synthetic data.publishersversionpublishe
The Role of Synthetic Data in Improving Supervised Learning Methods: The Case of Land Use/Land Cover Classification
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information ManagementIn remote sensing, Land Use/Land Cover (LULC) maps constitute important assets for
various applications, promoting environmental sustainability and good resource management.
Although, their production continues to be a challenging task. There are various factors
that contribute towards the difficulty of generating accurate, timely updated LULC maps,
both via automatic or photo-interpreted LULC mapping. Data preprocessing, being a
crucial step for any Machine Learning task, is particularly important in the remote sensing
domain due to the overwhelming amount of raw, unlabeled data continuously gathered
from multiple remote sensing missions. However a significant part of the state-of-the-art
focuses on scenarios with full access to labeled training data with relatively balanced class
distributions. This thesis focuses on the challenges found in automatic LULC classification
tasks, specifically in data preprocessing tasks. We focus on the development of novel
Active Learning (AL) and imbalanced learning techniques, to improve ML performance in
situations with limited training data and/or the existence of rare classes. We also show
that much of the contributions presented are not only successful in remote sensing problems,
but also in various other multidisciplinary classification problems. The work presented
in this thesis used open access datasets to test the contributions made in imbalanced
learning and AL. All the data pulling, preprocessing and experiments are made available at
https://github.com/joaopfonseca/publications. The algorithmic implementations are made
available in the Python package ml-research at https://github.com/joaopfonseca/ml-research
Adversarial Robustness in Unsupervised Machine Learning: A Systematic Review
As the adoption of machine learning models increases, ensuring robust models
against adversarial attacks is increasingly important. With unsupervised
machine learning gaining more attention, ensuring it is robust against attacks
is vital. This paper conducts a systematic literature review on the robustness
of unsupervised learning, collecting 86 papers. Our results show that most
research focuses on privacy attacks, which have effective defenses; however,
many attacks lack effective and general defensive measures. Based on the
results, we formulate a model on the properties of an attack on unsupervised
learning, contributing to future research by providing a model to use.Comment: 38 pages, 11 figure
Independent Distribution Regularization for Private Graph Embedding
Learning graph embeddings is a crucial task in graph mining tasks. An
effective graph embedding model can learn low-dimensional representations from
graph-structured data for data publishing benefiting various downstream
applications such as node classification, link prediction, etc. However, recent
studies have revealed that graph embeddings are susceptible to attribute
inference attacks, which allow attackers to infer private node attributes from
the learned graph embeddings. To address these concerns, privacy-preserving
graph embedding methods have emerged, aiming to simultaneously consider primary
learning and privacy protection through adversarial learning. However, most
existing methods assume that representation models have access to all sensitive
attributes in advance during the training stage, which is not always the case
due to diverse privacy preferences. Furthermore, the commonly used adversarial
learning technique in privacy-preserving representation learning suffers from
unstable training issues. In this paper, we propose a novel approach called
Private Variational Graph AutoEncoders (PVGAE) with the aid of independent
distribution penalty as a regularization term. Specifically, we split the
original variational graph autoencoder (VGAE) to learn sensitive and
non-sensitive latent representations using two sets of encoders. Additionally,
we introduce a novel regularization to enforce the independence of the
encoders. We prove the theoretical effectiveness of regularization from the
perspective of mutual information. Experimental results on three real-world
datasets demonstrate that PVGAE outperforms other baselines in private
embedding learning regarding utility performance and privacy protection.Comment: Accepted by CIKM 202
University of Windsor Graduate Calendar 2023 Winter
https://scholar.uwindsor.ca/universitywindsorgraduatecalendars/1026/thumbnail.jp
- …