76,297 research outputs found
On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory
To cluster, classify and represent are three fundamental objectives of
learning from high-dimensional data with intrinsic structure. To this end, this
paper introduces three interpretable approaches, i.e., segmentation
(clustering) via the Minimum Lossy Coding Length criterion, classification via
the Minimum Incremental Coding Length criterion and representation via the
Maximal Coding Rate Reduction criterion. These are derived based on the lossy
data coding and compression framework from the principle of rate distortion in
information theory. These algorithms are particularly suitable for dealing with
finite-sample data (allowed to be sparse or almost degenerate) of mixed
Gaussian distributions or subspaces. The theoretical value and attractive
features of these methods are summarized by comparison with other learning
methods or evaluation criteria. This summary note aims to provide a theoretical
guide to researchers (also engineers) interested in understanding 'white-box'
machine (deep) learning methods
Joint Task and Data Oriented Semantic Communications: A Deep Separate Source-channel Coding Scheme
Semantic communications are expected to accomplish various semantic tasks
with relatively less spectrum resource by exploiting the semantic feature of
source data. To simultaneously serve both the data transmission and semantic
tasks, joint data compression and semantic analysis has become pivotal issue in
semantic communications. This paper proposes a deep separate source-channel
coding (DSSCC) framework for the joint task and data oriented semantic
communications (JTD-SC) and utilizes the variational autoencoder approach to
solve the rate-distortion problem with semantic distortion. First, by analyzing
the Bayesian model of the DSSCC framework, we derive a novel rate-distortion
optimization problem via the Bayesian inference approach for general data
distributions and semantic tasks. Next, for a typical application of joint
image transmission and classification, we combine the variational autoencoder
approach with a forward adaption scheme to effectively extract image features
and adaptively learn the density information of the obtained features. Finally,
an iterative training algorithm is proposed to tackle the overfitting issue of
deep learning models. Simulation results reveal that the proposed scheme
achieves better coding gain as well as data recovery and classification
performance in most scenarios, compared to the classical compression schemes
and the emerging deep joint source-channel schemes
Generative Adversarial User Privacy in Lossy Single-Server Information Retrieval
We propose to extend the concept of private information retrieval by allowing
for distortion in the retrieval process and relaxing the perfect privacy
requirement at the same time. In particular, we study the tradeoff between
download rate, distortion, and user privacy leakage, and show that in the limit
of large file sizes this trade-off can be captured via a novel
information-theoretical formulation for datasets with a known distribution.
Moreover, for scenarios where the statistics of the dataset is unknown, we
propose a new deep learning framework by leveraging a generative adversarial
network approach, which allows the user to learn efficient schemes from the
data itself, minimizing the download cost. We evaluate the performance of the
scheme on a synthetic Gaussian dataset as well as on both the MNIST and
CIFAR-10 datasets. For the MNIST dataset, the data-driven approach
significantly outperforms a non-learning based scheme which combines source
coding with multiple file download, while the CIFAR-10 performance is notably
better.Comment: Submitted to IEEE for possible publication. This paper was presented
in part at the NeurIPS 2020 Workshop on Privacy Preserving Machine Learning -
PRIML and PPML Joint Editio
- …