249 research outputs found

    Database Matching Under Adversarial Column Deletions

    Full text link
    The de-anonymization of users from anonymized microdata through matching or aligning with publicly-available correlated databases has been of scientific interest recently. While most of the rigorous analyses of database matching have focused on random-distortion models, the adversarial-distortion models have been wanting in the relevant literature. In this work, motivated by synchronization errors in the sampling of time-indexed microdata, matching (alignment) of random databases under adversarial column deletions is investigated. It is assumed that a constrained adversary, which observes the anonymized database, can delete up to a δ\delta fraction of the columns (attributes) to hinder matching and preserve privacy. Column histograms of the two databases are utilized as permutation-invariant features to detect the column deletion pattern chosen by the adversary. The detection of the column deletion pattern is then followed by an exact row (user) matching scheme. The worst-case analysis of this two-phase scheme yields a sufficient condition for the successful matching of the two databases, under the near-perfect recovery condition. A more detailed investigation of the error probability leads to a tight necessary condition on the database growth rate, and in turn, to a single-letter characterization of the adversarial matching capacity. This adversarial matching capacity is shown to be significantly lower than the random matching capacity, where the column deletions occur randomly. Overall, our results analytically demonstrate the privacy-wise advantages of adversarial mechanisms over random ones during the publication of anonymized time-indexed data

    Distributed Deep Joint Source-Channel Coding with Decoder-Only Side Information

    Full text link
    We consider low-latency image transmission over a noisy wireless channel when correlated side information is present only at the receiver side (the Wyner-Ziv scenario). In particular, we are interested in developing practical schemes using a data-driven joint source-channel coding (JSCC) approach, which has been previously shown to outperform conventional separation-based approaches in the practical finite blocklength regimes, and to provide graceful degradation with channel quality. We propose a novel neural network architecture that incorporates the decoder-only side information at multiple stages at the receiver side. Our results demonstrate that the proposed method succeeds in integrating the side information, yielding improved performance at all channel noise levels in terms of the various distortion criteria considered here, especially at low channel signal-to-noise ratios (SNRs) and small bandwidth ratios (BRs). We also provide the source code of the proposed method to enable further research and reproducibility of the results.Comment: 7 pages, 4 figure

    Neural Distributed Compressor Discovers Binning

    Full text link
    We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.Comment: draft of a journal version of our previous ISIT 2023 paper (available at: arXiv:2305.04380). arXiv admin note: substantial text overlap with arXiv:2305.0438

    Precoding-oriented Massive MIMO CSI Feedback Design

    Full text link
    Downlink massive multiple-input multiple-output (MIMO) precoding algorithms in frequency division duplexing (FDD) systems rely on accurate channel state information (CSI) feedback from users. In this paper, we analyze the tradeoff between the CSI feedback overhead and the performance achieved by the users in systems in terms of achievable rate. The final goal of the proposed system is to determine the beamforming information (i.e., precoding) from channel realizations. We employ a deep learning-based approach to design the end-to-end precoding-oriented feedback architecture, that includes learned pilots, users' compressors, and base station processing. We propose a loss function that maximizes the sum of achievable rates with minimal feedback overhead. Simulation results show that our approach outperforms previous precoding-oriented methods, and provides more efficient solutions with respect to conventional methods that separate the CSI compression blocks from the precoding processing.Comment: 6 pages, IEEE ICC 202

    Non-Coherent Active Device Identification for Massive Random Access

    Full text link
    Massive Machine-Type Communications (mMTC) is a key service category in the current generation of wireless networks featuring an extremely high density of energy and resource-limited devices with sparse and sporadic activity patterns. In order to enable random access in such mMTC networks, base station needs to identify the active devices while operating within stringent access delay constraints. In this paper, an energy efficient active device identification protocol is proposed in which active devices transmit On-Off Keying (OOK) modulated preambles jointly and base station employs non-coherent energy detection avoiding channel estimation overheads. The minimum number of channel-uses required by the active user identification protocol is characterized in the asymptotic regime of total number of devices \ell when the number of active devices kk scales as k=Θ(1)k=\Theta(1) along with an achievability scheme relying on the equivalence of activity detection to a group testing problem. Several practical schemes based on Belief Propagation (BP) and Combinatorial Orthogonal Matching Pursuit (COMP) are also proposed. Simulation results show that BP strategies outperform COMP significantly and can operate close to the theoretical achievability bounds. In a partial-recovery setting where few misdetections are allowed, BP continues to perform well

    Database Matching Under Noisy Synchronization Errors

    Full text link
    The re-identification or de-anonymization of users from anonymized data through matching with publicly available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to anonymization. Recent research provides a fundamental understanding of the conditions under which privacy attacks, in the form of database matching, are successful in the presence of obfuscation. Motivated by synchronization errors stemming from the sampling of time-indexed databases, this paper presents a unified framework considering both obfuscation and synchronization errors and investigates the matching of databases under noisy entry repetitions. By investigating different structures for the repetition pattern, replica detection and seeded deletion detection algorithms are devised and sufficient and necessary conditions for successful matching are derived. Finally, the impacts of some variations of the underlying assumptions, such as the adversarial deletion model, seedless database matching, and zero-rate regime, on the results are discussed. Overall, our results provide insights into the privacy-preserving publication of anonymized and obfuscated time-indexed data as well as the closely related problem of the capacity of synchronization channels

    Distribution-Agnostic Database De-Anonymization Under Synchronization Errors

    Full text link
    There has recently been an increased scientific interest in the de-anonymization of users in anonymized databases containing user-level microdata via multifarious matching strategies utilizing publicly available correlated data. Existing literature has either emphasized practical aspects where underlying data distribution is not required, with limited or no theoretical guarantees, or theoretical aspects with the assumption of complete availability of underlying distributions. In this work, we take a step towards reconciling these two lines of work by providing theoretical guarantees for the de-anonymization of random correlated databases without prior knowledge of data distribution. Motivated by time-indexed microdata, we consider database de-anonymization under both synchronization errors (column repetitions) and obfuscation (noise). By modifying the previously used replica detection algorithm to accommodate for the unknown underlying distribution, proposing a new seeded deletion detection algorithm, and employing statistical and information-theoretic tools, we derive sufficient conditions on the database growth rate for successful matching. Our findings demonstrate that a double-logarithmic seed size relative to row size ensures successful deletion detection. More importantly, we show that the derived sufficient conditions are the same as in the distribution-aware setting, negating any asymptotic loss of performance due to unknown underlying distributions

    Seeded Database Matching Under Noisy Column Repetitions

    Full text link
    The re-identification or de-anonymization of users from anonymized data through matching with publicly-available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to anonymization. Recent research provides a fundamental understanding of the conditions under which privacy attacks are successful, either in the presence of obfuscation or synchronization errors stemming from the sampling of time-indexed databases. This paper presents a unified framework considering both obfuscation and synchronization errors and investigates the matching of databases under noisy column repetitions. By devising replica detection and seeded deletion detection algorithms, and using information-theoretic tools, sufficient conditions for successful matching are derived. It is shown that a seed size logarithmic in the row size is enough to guarantee the detection of all deleted columns. It is also proved that this sufficient condition is necessary, thus characterizing the database matching capacity of database matching under noisy column repetitions and providing insights on privacy-preserving publication of anonymized and obfuscated time-indexed data

    Matching of Markov Databases Under Random Column Repetitions

    Full text link
    Matching entries of correlated shuffled databases have practical applications ranging from privacy to biology. In this paper, motivated by synchronization errors in the sampling of time-indexed databases, matching of random databases under random column repetitions and deletions is investigated. It is assumed that for each entry (row) in the database, the attributes (columns) are correlated, which is modeled as a Markov process. Column histograms are proposed as a permutation-invariant feature to detect the repetition pattern, whose asymptotic-uniqueness is proved using information-theoretic tools. Repetition detection is then followed by a typicality-based row matching scheme. Considering this overall scheme, sufficient conditions for successful matching of databases in terms of the database growth rate are derived. A modified version of Fano's inequality leads to a tight necessary condition for successful matching, establishing the matching capacity under column repetitions. This capacity is equal to the erasure bound, which assumes the repetition locations are known a-priori. Overall, our results provide insights on privacy-preserving publication of anonymized time-indexed data
    corecore