76,297 research outputs found

    On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory

    Full text link
    To cluster, classify and represent are three fundamental objectives of learning from high-dimensional data with intrinsic structure. To this end, this paper introduces three interpretable approaches, i.e., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion. These are derived based on the lossy data coding and compression framework from the principle of rate distortion in information theory. These algorithms are particularly suitable for dealing with finite-sample data (allowed to be sparse or almost degenerate) of mixed Gaussian distributions or subspaces. The theoretical value and attractive features of these methods are summarized by comparison with other learning methods or evaluation criteria. This summary note aims to provide a theoretical guide to researchers (also engineers) interested in understanding 'white-box' machine (deep) learning methods

    Joint Task and Data Oriented Semantic Communications: A Deep Separate Source-channel Coding Scheme

    Full text link
    Semantic communications are expected to accomplish various semantic tasks with relatively less spectrum resource by exploiting the semantic feature of source data. To simultaneously serve both the data transmission and semantic tasks, joint data compression and semantic analysis has become pivotal issue in semantic communications. This paper proposes a deep separate source-channel coding (DSSCC) framework for the joint task and data oriented semantic communications (JTD-SC) and utilizes the variational autoencoder approach to solve the rate-distortion problem with semantic distortion. First, by analyzing the Bayesian model of the DSSCC framework, we derive a novel rate-distortion optimization problem via the Bayesian inference approach for general data distributions and semantic tasks. Next, for a typical application of joint image transmission and classification, we combine the variational autoencoder approach with a forward adaption scheme to effectively extract image features and adaptively learn the density information of the obtained features. Finally, an iterative training algorithm is proposed to tackle the overfitting issue of deep learning models. Simulation results reveal that the proposed scheme achieves better coding gain as well as data recovery and classification performance in most scenarios, compared to the classical compression schemes and the emerging deep joint source-channel schemes

    Generative Adversarial User Privacy in Lossy Single-Server Information Retrieval

    Full text link
    We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the tradeoff between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself, minimizing the download cost. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on both the MNIST and CIFAR-10 datasets. For the MNIST dataset, the data-driven approach significantly outperforms a non-learning based scheme which combines source coding with multiple file download, while the CIFAR-10 performance is notably better.Comment: Submitted to IEEE for possible publication. This paper was presented in part at the NeurIPS 2020 Workshop on Privacy Preserving Machine Learning - PRIML and PPML Joint Editio
    • …
    corecore