95 research outputs found

    Hybrid Random Fields

    Full text link
    This book presents an exciting new synthesis of directed and undirected, discrete and continuous graphical models. Combining elements of Bayesian networks and Markov random fields, the newly introduced hybrid random fields are an interesting approach to get the best of both these worlds, with an added promise of modularity and scalability. The authors have written an enjoyable book---rigorous in the treatment of the mathematical background, but also enlivened by interesting and original historical and philosophical perspectives. -- Manfred Jaeger, Aalborg Universitet The book not only marks a

    Nonparametric Hidden Markov Models: Principles and Applications to Speech Recognition

    No full text
    Continuous-density hidden Markov models (HMM) are a popular approach to the problem of modeling sequential data, e.g. in automatic speech recognition (ASR), off-line handwritten text recognition, and bioinformatics. HMMs rely on strong assumptions on their statistical properties, e.g. the arbitrary parametric assumption on the form of the emission probability density functions (pdfs). This chapter proposes a nonparametric HMM based on connectionist estimates of the emission pdfs, featuring a global gradient-ascent training algorithm over the maximum-likelihood criterion. Robustness to noise may be further increased relying on a soft parameter grouping technique, namely the introduction of adaptive amplitudes of activation functions. Applications to ASR tasks are presented and analyzed, evaluating the behavior of the proposed paradigm and allowing for a comparison with standard HMMs with Gaussian mixtures, as well as with other state-of-the-art neural net/HMM hybrids

    Activation functions with learnable amplitude

    No full text
    Network training algorithms have heavily concentrated on the learning of connection weights. Little effort has been made to learn the amplitude of the activation functions, which defines the range of values that the function can take. This paper introduces novel algorithms to learn the amplitudes of non-linear activations in layered networks, without any assumption on their analytical form. Three instances of the algorithms are developed: (i) a common amplitude is shared among all the non-linear units; (ii) each layer has its own amplitude; (iii) neuron-specific amplitudes are allowed. Experimental results validate the approach to a large extent, showing a dramatic improvement in performance over the nets with fixed amplitude

    A Novel Connectionist-Oriented Feature Normalization Technique

    No full text

    Asymptotic Convergence of Soft-Constrained Neural Networks for Density Estimation

    No full text
    A soft-constrained neural network for density estimation (SC-NN-4pdf) has recently been introduced to tackle the issues arising from the application of neural networks to density estimation problems (in particular, the satisfaction of the second Kolmogorov axiom). Although the SC-NN-4pdf has been shown to outperform parametric and non-parametric approaches (from both the machine learning and the statistics areas) over a variety of univariate and multivariate density estimation tasks, no clear rationale behind its performance has been put forward so far. Neither has there been any analysis of the fundamental theoretical properties of the SC-NN-4pdf. This paper narrows the gaps, delivering a formal statement of the class of density functions that can be modeled to any degree of precision by SC-NN-4pdfs, as well as a proof of asymptotic convergence in probability of the SC-NN-4pdf training algorithm under mild conditions for a popular class of neural architectures. These properties of the SC-NN-4pdf lay the groundwork for understanding the strong estimation capabilities that SC-NN-4pdfs have only exhibited empirically so far

    Learning the Width of Activations in Neural Networks

    No full text
    This report introduces a novel algorithm to learn the width of non-linear activation functions (of arbitrary analytical form) in layered networks. The algorithm is based on a steepest gradient-descent technique, and relies on the inductive proof of a theorem that involves the novel concept of expansion function of the activation associated to a given unit of the neural net. Experimental results obtained in a speaker nomalization task with a mixture of Multilayer Perceptron show a dramatic improvement of performance with respect to the standard Back-Propagation trainin

    A Neural Probabilistic Graphical Model for Learning and Decision Making in Evolving Structured Environments

    No full text
    A difficult and open problem in artificial intelligence is the development of agents that can operate in complex environments which change over time. The present communication introduces the formal notions, the architecture, and the training algorithm of a machine capable of learning and decision-making in evolving structured environments. These environments are defined as sets of evolving relations among evolving entities. The proposed machine relies on a probabilistic graphical model whose time-dependent latent variables undergo a Markov assumption. The likelihood of such variables given the structured environment is estimated via a probabilistic variant of the recursive neural network

    Soft-Constrained Nonparametric Density Estimation with Artificial Neural Networks

    No full text
    The estimation of probability density functions (pdf) from unlabeled data samples is a relevant (and, still open) issue in pattern recognition and machine learning. Statistical parametric and nonparametric approaches present severe drawbacks. Only a few instances of neural networks for pdf estimation are found in the literature, due to the intrinsic difficulty of unsupervised learning under the necessary integral-equals-one constraint. In turn, also such neural networks do suffer from serious limitations. The paper introduces a soft-constrained algorithm for training a multilayer perceptron (MLP) to estimate pdfs empirically. A variant of the Metropolis-Hastings algorithm (exploiting the very probabilistic nature of the MLP) is used to satisfy numerically the constraint on the integral of the function learned by the MLP. The preliminary outcomes of a simulation on data drawn from a mixture of Fisher-Tippett pdfs are reported on, and compared graphically with the estimates yielded by statistical techniques, showing the viability of the approach

    Soft-constrained neural networks for nonparametric density estimation

    No full text
    The paper introduces a robust connectionist technique for the empirical nonparametric estimation of multivariate probability density functions (pdf) from unlabeled data samples (still an open issue in pattern recognition and machine learning). To this end, a soft-constrained unsupervised algorithm for training a multilayer perceptron (MLP) is proposed. A variant of the Metropolis–Hastings algorithm (exploiting the very probabilistic nature of the present MLP) is used to guarantee a model that satisfies numerically Kolmogorov’s second axiom of probability. The approach overcomes the major limitations of the established statistical and connectionist pdf estimators. Graphical and quantitative experimental results show that the proposed technique can offer estimates that improve significantly over parametric and nonparametric approaches, regardless of (1) the complexity of the underlying pdf, (2) the dimensionality of the feature space, and (3) the amount of data available for training

    Networks with trainable amplitude of activation functions

    No full text
    Network training algorithms have heavily concentrated on the learning of connection weights. Little effort has been made to learn the amplitude of activation functions, which defines the range of values that the function can take. This paper introduces novel algorithms to learn the amplitudes of nonlinear activations in layered networks, without any assumption on their analytical form. Three instances of the algorithms are developed: (i) a common amplitude is shared among all nonlinear units; (ii) each layer has its own amplitude; and (iii) neuron-specific amplitudes are allowed. The algorithms can also be seen as a particular double-step gradient-descent procedure, as gradient-driven adaptive learning rate schemes, or as weight-grouping techniques that are consistent with known scaling laws for regularization with weight decay. As a side effect, a self-pruning mechanism of redundant neurons may emerge. Experimental results on function approximation, classification, and regression tasks, with synthetic and real-world data, validate the approach and show that the algorithms speed up convergence and modify the search path in the weight space, possibly reaching deeper minima that may also improve generalization
    • …
    corecore