21 research outputs found

    Block Belief Propagation for Parameter Learning in Markov Random Fields

    Full text link
    Traditional learning methods for training Markov random fields require doing inference over all variables to compute the likelihood gradient. The iteration complexity for those methods therefore scales with the size of the graphical models. In this paper, we propose \emph{block belief propagation learning} (BBPL), which uses block-coordinate updates of approximate marginals to compute approximate gradients, removing the need to compute inference on the entire graphical model. Thus, the iteration complexity of BBPL does not scale with the size of the graphs. We prove that the method converges to the same solution as that obtained by using full inference per iteration, despite these approximations, and we empirically demonstrate its scalability improvements over standard training methods.Comment: Accepted to AAAI 201

    Low-complexity iterative receiver design for high spectral efficiency communication systems

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.With the rapid development of the modern society, people have an increasing demand of higher data rate. Due to the limited available bandwidth, how to improve the spectral efficiency becomes a key issue in the next generation wireless systems. Recent researches show that, compared to the conventional orthogonal communication systems, the non-orthogonal system can transmit more information with the same resources by introducing non-orthogonality. The non-orthogonal communication systems can be achieved by using faster-than-Nyquist (FTN) signaling to transmit more data symbols in the same time period. On the other hand, by designing appropriate codebook, the sparse code multiple access (SCMA) system can support more users while preserving the same resource elements. Utilisation of these new technologies leads to challenge in receiver design, which becomes severer in complex channel environments. This thesis studies the receiver design for high spectral efficiency communication systems. The main contributions are as follows: 1. A hybrid message passing algorithm is proposed for faster-than-Nyquist, which solves the problem of joint data detection and channel estimation when the channel coefficients are unknown. To fully exploit the known ISI imposed by FTN signaling, the interference induced by FTN signaling and channel fading are intentionally separated. 2. Gaussian message passing and variational inference based estimation algorithms are proposed for faster-than-Nyquist signaling detection in doubly selective channels. Iterative receivers using mean field and Bethe approximations based on variational inference framework are proposed. Moreover, a novel Gaussian message passing based FTN signaling detection algorithm is proposed. 3. An energy minimisation based SCMA decoding algorithm is proposed and convergence analysis of the proposed algorithm is derived. Following optimisation theory and variational free energy framework, the posterior distribution of data symbol is derived in closed form. Then, the convergence property of the proposed algorithm is analysed. 4. A stretched factor graph is designed for MIMO-SCMA system in order to reduce the receiver complexity. Then, a convergence guaranteed message passing algorithm is proposed by convexifying the Bethe free energy. Finally, cooperative communication methods based on belief consensus and alternative direction method of multipliers are proposed. 5. A low complexity detection algorithm is proposed for faster-than-Nyquist SCMA system, which enables joint channel estimation, decoding and user activity detection in grant-free systems. The combination of FTN signaling with SCMA to further enhance the spectral efficiency is first considered. Then, a merging belief propagation and expectation propagation algorithm is proposed to estimate channel state and perform SCMA decoding

    Blending Learning and Inference in Conditional Random Fields

    Get PDF
    Conditional random fields maximize the log-likelihood of training labels given the training data, e.g., objects given images. In many cases the training labels are structures that consist of a set of variables and the computational complexity for estimating their likelihood is exponential in the number of the variables. Learning algorithms relax this computational burden using approximate inference that is nested as a sub-procedure. In this paper we describe the objective function for nested learning and inference in conditional random fields. The devised objective maximizes the log-beliefs -probability distributions over subsets of training variables that agree on their marginal probabilities. This objective is concave and consists of two types of variables that are related to the learning and inference tasks respectively. Importantly, we afterwards show how to blend the learning and inference procedure and effectively get to the identical optimum much faster. The proposed algorithm currently achieves the state-of-the-art in various computer vision applications

    On the Stability of Structured Prediction

    Get PDF
    Many important applications of artificial intelligence---such as image segmentation, part-of-speech tagging and network classification---are framed as multiple, interdependent prediction tasks. These structured prediction problems are typically modeled using some form of joint inference over the outputs, to exploit the relational dependencies. Joint reasoning can significantly improve predictive accuracy, but it introduces a complication in the analysis of structured models: the stability of inference. In optimizations involving multiple interdependent variables, such as joint inference, a small change to the input or parameters could induce drastic changes in the solution. In this dissertation, I investigate the impact of stability in structured prediction. I explore two topics, connected by the stability of inference. First, I provide generalization bounds for learning from a limited number of examples with large internal structure. The effective learning rate can be significantly sharper than rates given in related work. Under certain conditions on the data distribution and stability of the predictor, the bounds decrease with both the number of examples and the size of each example, meaning one could potentially learn from a single giant example. Secondly, I investigate the benefits of learning with strongly convex variational inference. Using the duality between strong convexity and stability, I demonstrate, both theoretically and empirically, that learning with a strongly convex free energy can result in significantly more accurate marginal probabilities. One consequence of this work is a new technique that ``strongly convexifies" many free energies used in practice. These two seemingly unrelated threads are tied by the idea that stable inference leads to lower error, particularly in the limited example setting, thereby demonstrating that inference stability is of critical importance to the study and practice of structured prediction

    Methods for Inference in Graphical Models

    Get PDF
    Graphical models provide a flexible, powerful and compact way to model relationships between random variables, and have been applied with great success in many domains. Combining prior beliefs with observed evidence to form a prediction is called inference. Problems of great interest include finding a configuration with highest probability (MAP inference) or solving for the distribution over a subset of variables (marginal inference). Further, these methods are often critical subroutines for learning the relationships. However, inference is computationally intractable in general. Hence, much effort has focused on two themes: finding subdomains where exact inference is solvable efficiently, or identifying approximate methods that work well. We explore both these themes, restricting attention to undirected graphical models with discrete variables. First we address exact MAP inference by advancing the recent method of reducing the problem to finding a maximum weight stable set (MWSS) on a derived graph, which, if perfect, admits polynomial time inference. We derive new results for this approach, including a general decomposition theorem for models of any order and number of labels, extensions of results for binary pairwise models with submodular cost functions to higher order, and a characterization of which binary pairwise models can be efficiently solved with this method. This clarifies the power of the approach on this class of models, improves our toolbox and provides insight into the range of tractable models. Next we consider methods of approximate inference, with particular emphasis on the Bethe approximation, which is in widespread use and has proved remarkably effective, yet is still far from being completely understood. We derive new formulations and properties of the derivatives of the Bethe free energy, then use these to establish an algorithm to compute log of the optimum Bethe partition function to arbitrary epsilon-accuracy. Further, if the model is attractive, we demonstrate a fully polynomial-time approximation scheme (FPTAS), which is an important theoretical result, and demonstrate its practical applications. Next we explore ways to tease apart the two aspects of the Bethe approximation, i.e. the polytope relaxation and the entropy approximation. We derive analytic results, show how optimization may be explored over various polytopes in practice, even for large models, and remark on the observed performance compared to the true distribution and the tree-reweighted (TRW) approximation. This reveals important novel observations and helps guide inference in practice. Finally, we present results related to clamping a selection of variables in a model. We derive novel lower bounds on an array of approximate partition functions based only on the model's topology. Further, we show that in an attractive binary pairwise model, clamping any variable and summing over the approximate sub-partition functions can only increase (hence improve) the Bethe approximation, then use this to provide a new, short proof that the Bethe partition function lower bounds the true value for this class of models. The bulk of this work focuses on the class of binary, pairwise models, but several results apply more generally
    corecore