Adaptive Distributed Source Coding Based on Bayesian Inference

Abstract

Distributed Source Coding (DSC) is an important topic for both in information theory and communication. DSC utilizes the correlations among the sources to compress data, and it has the advantages of being simple and easy to carry out. In DSC, Slepian-Wolf (S-W) and Wyner-Ziv (W-Z) are two important problems, which can be classified as lossless compression and loss compression, respectively. Although the lower bounds of the S-W and W-Z problems have been known to researchers for many decades, the code design to achieve the lower bounds is still an open problem. This dissertation focuses on three DSC problems: the adaptive Slepian-Wolf decoding for two binary sources (ASWDTBS) problem, the compression of correlated temperature data of sensor network (CCTDSN) problem and the streamlined genome sequence compression using distributed source coding (SGSCUDSC) problem. For the CCTDSN and SGSCUDSC problems, sources will be converted into the binary expression as the sources in ASWDTBS problem for encoding. The Bayesian inference will be applied to all of these three problems. To efficiently solve these Bayesian inferences, message passing algorithm will be applied. For a discrete variable that takes a small number of values, the belief propagation (BP) algorithm is able to implement the message passing algorithm efficiently. However, the complexity of the BP algorithm increases exponentially with the number of values of the variable. Therefore, the BP algorithm can only deal with discrete variable that takes a small number of values and limited continuous variables. For the more complex variables, deterministic approximation methods are used. These methods, such as the variational Bayes (VB) method and expectation propagation (EP) method, can efficiently incorporated into the message passing algorithm. A virtual binary asymmetric channel (BAC) channel was introduced to model the correlation between the source data and the side information (SI) in ASWDTBS problem, in which two parameters are required to be learned. The two parameters correspond to the crossover probabilities that are 0->1 and 1->0. Based on this model, a factor graph was established that includes LDPC code, source data, SI and both of the crossover probabilities. Since the crossover probabilities are continuous variables, the deterministic approximate inference methods will be incorporated into the message passing algorithm. The proposed algorithm was applied to the synthetic data, and the results showed that the VB-based algorithm achieved much better performance than the performances of the EP-based algorithm and the standard BP algorithm. The poor performance of the EP-based algorithm was also analyzed. For the CCTDSN problem, the temperature data were collected by crossbow sensors. Four sensors were established in different locations of the laboratory and their readings were sent to the common destination. The data from one sensor were used as the SI, and the data from the other 3 sensors were compressed. The decoding algorithm considers both spatial and temporal correlations, which are in the form of Kalman filter in the factor graph. To deal with the mixtures of the discrete messages and the continuous messages (Gaussians) in the Kalman filter region of the factor graph, the EP algorithm was implemented so that all of the messages were approximated by the Gaussian distribution. The testing results on the wireless network have indicated that the proposed algorithm outperforms the prior algorithm. The SGSCUDSC consists of developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require a heavy-client (encoder side) cannot be applied. To tackle this challenge, the DSC theory was carefully examined, and a customized reference-based genome compression protocol was developed to meet the low-complexity need at the client side. Based on the variation between the source and the SI, this protocol will adaptively select either syndrome coding or hash coding to compress variable lengths of code subsequences. The experimental results of the proposed method showed promising performance when compared with the state of the art algorithm (GRS)

    Similar works