Search CORE

935 research outputs found

A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms

Author: Cox Marco
de Vries Bert
van de Laar Thijs
Publication venue: 'Elsevier BV'
Publication date: 08/11/2018
Field of study

The benefits of automating design cycles for Bayesian inference-based algorithms are becoming increasingly recognized by the machine learning community. As a result, interest in probabilistic programming frameworks has much increased over the past few years. This paper explores a specific probabilistic programming paradigm, namely message passing in Forney-style factor graphs (FFGs), in the context of automated design of efficient Bayesian signal processing algorithms. To this end, we developed "ForneyLab" (https://github.com/biaslab/ForneyLab.jl) as a Julia toolbox for message passing-based inference in FFGs. We show by example how ForneyLab enables automatic derivation of Bayesian signal processing algorithms, including algorithms for parameter estimation and model comparison. Crucially, due to the modular makeup of the FFG framework, both the model specification and inference methods are readily extensible in ForneyLab. In order to test this framework, we compared variational message passing as implemented by ForneyLab with automatic differentiation variational inference (ADVI) and Monte Carlo methods as implemented by state-of-the-art tools "Edward" and "Stan". In terms of performance, extensibility and stability issues, ForneyLab appears to enjoy an edge relative to its competitors for automated inference in state-space models.Comment: Accepted for publication in the International Journal of Approximate Reasonin

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Framework for the configuration and training of neural networks using Bayesian Optimization

Author: Silva Tiago Filipe Alves da
Publication venue
Publication date: 01/01/2020
Field of study

Redes neuronais existem há décadas, tendo sido primeiramente introduzidas nos anos 40 por dois cientistas que modelaram uma simples rede neuronal usando circuitos elétricos. Desde então, vários avanços têm sido feitos no campo de redes neuronais com o objetivo de as adaptar na resolução de tarefas cada vez mais complexas, por sua vez levando a que as suas arquiteturas se tornem gradualmente mais elaboradas. Esta progressão tem dificultado a melhoria da qualidade de redes neuronais por parte de utilizadores, visto haver cada vez mais hiperparâmetros (i.e. componentes arquiteturais) que requerem ajustes na tentativa de melhorarem a sua precisão. A otimização de hiperparâmetros de uma rede neuronal é feita ajustando os mesmos de maneira a encontrar a arquitetura com os melhores resultados, podendo ser feita de forma tentativa erro, e guiada por algoritmos que o facilitem. Esta tese enquadra-se neste tema, apresentado uma solução que utiliza otimização Bayesiana como o algoritmo de otimização de hiperparâmetros para automaticamente configurar qualquer tipo de rede neuronal. O sistema desenvolvido não só otimiza os hiperparâmetros de redes neuronais, mas também localiza as caraterísticas mais relevantes de um conjunto de dados (também conhecido como seleção de caraterísticas) e aprende como cada hiperparâmetro e caraterística afeta o desempenho da rede, tornando-o útil na previsão do desempenho de uma configuração de uma rede neuronal sem sequer ter que a treinar e testar. Os resultados observados na avaliação do sistema demonstram as suas fortes capacidades de aprendizagem e a sua habilidade de balancear a exploração de configurações com elevadas chances de ter um desempenho alto com a exploração de configurações menos familiares com um nível de desempenho mais imprevisível, de forma a evitar contentar-se com uma configuração suficientemente boa e tentar encontrar aquela com precisão máxima. Tanto o caso de estudo como a otimização de uma rede neuronal convolucional realizados demonstram a capacidade de adaptação do sistema a diferentes tipos de redes neuronais e de obtenção de resultados positivos em ambos os cenários. A avaliação do sistema demonstra o potencial do mesmo e com desenvolvimentos futuros poderá atingir um nível de qualidade e desempenho onde será capaz de encontrar configurações que superem aquelas provenientes tanto de abordagens manuais e automáticas existentes.Neural networks have existed for decades, having first been introduced in the 1940s by two scientists modelling a simple neural network using electrical circuits. Since then, many advancements have been made in the field of neural networks with the intention of adapting them to solve increasingly more complex tasks, in turn leading to neural networks architectures gradually becoming more intricate. This progression has made it harder for users to improve the quality of neural networks, as there are ever more hyperparameters (i.e. architecture components) that require tweaking in an attempt to increase their accuracy. In an attempt to overcome this issue, the concept of hyperparameters optimization emerged, where each hyperparameter of a neural network is adjusted manually or automatically by a system, so as to find the network architecture with the best results. This thesis delves into this subject, presenting a solution that employs Bayesian optimization as its hyperparameters optimization algorithm to automatically configure any type of neural network. The developed system not only optimizes the hyperparameters of neural networks, but it can also pinpoint the most relevant features in a dataset (also known as feature selection) and learn how each hyperparameter and feature affects the performance of the network, making it useful for predicting the performance of a neural network configuration without even having to train and test it in the first place. The results observed in the evaluation of the system showcase its strong learning capabilities and its ability to balance the exploitation of configurations with an elevated chance of having a high performance and the exploration of unknown configurations with an unpredictable level of performance, in an attempt to avoid settling for a good enough configuration and find the best one. Both the undertaken case study and optimization of a convolutional neural network demonstrate the system’s ability to adapt to different types of neural networks and obtain positive results in both scenarios. The system’s evaluation demonstrates it has potential and with future work can reach a level of quality and performance where it can find configurations that surpass those of both existing automatic and manual approaches

Repositório Científico do Instituto Politécnico do Porto

Recommended from our members

Efficient Variational Inference for Hierarchical Models of Images, Text, and Networks

Author: Ji Geng
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Variational inference provides a general optimization framework to approximate the posterior distributions of latent variables in probabilistic models. Although effective in simple scenarios, variational inference may be inaccurate or infeasible when the data is high-dimensional, the model structure is complicated, or variable relationships are non-conjugate. We propose solutions to these problems through the smart design and leverage of model structures, the rigorous derivation of variational bounds, and the creation of flexible algorithms for various models with rich, non-conjugate dependencies.Concretely, we first design an interpretable generative model for natural images, in which the hundreds of thousands of pixels per image are split into small patches represented by Gaussian mixture models. Through structured variational inference, the evidence lower bound of this model automatically recovers the popular expected patch log-likelihood method for image processing. A nonparametric extension using hierarchical Dirichlet processes further enables self-similarities to be captured and image-specific clusters created during inference, boosting image denoising and inpainting accuracy.Then we move on to text data, and design hierarchical topic graphs that generalize the bipartite noisy-OR models previously used for medical diagnosis. We derive auxiliary bounds to overcome the non-conjugacy of noisy-OR conditionals, and use stochastic variational inference to efficiently train on datasets with hundreds of thousands of documents. We dramatically increase the algorithm speed through a constrained family of variational bounds, so that only the ancestors of the sparse observed tokens of each document need to be considered.Finally, we propose a general-purpose Monte Carlo variational inference strategy that is directly applicable to any model with discrete variables. Compared to REINFORCE-style stochastic gradient updates, our coordinate-ascent updates have lower variance and converge much faster. Compared to auxiliary-variable bounds crafted for each individual model, our algorithm is simpler to derive and may be easily integrated into probabilistic programming languages for broader use. By avoiding auxiliary variables, we also tighten likelihood bounds and increase robustness to local optima. Extensive experiments on real-world models of images, text, and networks illustrate these appealing advantages

eScholarship - University of California

Representation Learning: A Review and New Perspectives

Author: Bengio Yoshua
Courville Aaron
Vincent Pascal
Publication venue
Publication date: 01/01/2014
Field of study

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

arXiv.org e-Print Archive

CiteSeerX

Online Multi-Stage Deep Architectures for Feature Extraction and Object Recognition

Author: Rose Derek Christopher
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2013
Field of study

Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. Large datasets with high-dimensional features complicate the implementation of visual architectures in memory constrained environments. This dissertation constructs online learning replacements for the components within a multi-stage architecture and demonstrates that the proposed replacements (namely fuzzy competitive clustering, an incremental covariance estimator, and multi-layer neural network) can offer performance competitive with their offline batch counterparts while providing a reduced memory footprint. The online nature of this solution allows for the development of a method for adjusting parameters within the architecture via stochastic gradient descent. Testing over multiple datasets shows the potential benefits of this methodology when appropriate priors on the initial parameters are unknown. Alternatives to batch based decompositions for a whitening preprocessing stage which take advantage of natural image statistics and allow simple dictionary learners to work well in the problem domain are also explored. Expansions of the architecture using additional pooling statistics and multiple layers are presented and indicate that larger codebook sizes are not the only step forward to higher classification accuracies. Experimental results from these expansions further indicate the important role of sparsity and appropriate encodings within multi-stage visual feature extraction architectures

University of Tennessee, Knoxville: Trace

DeepCare: A Deep Dynamic Memory Model for Predictive Medicine

Author: A Graves
AB Jensen
BB Granger
J Futoma
JM Corbin
JS Mathias
K Orphanou
PB Jensen
R Henriques
S Hochreiter
SJ Henly
T Tran
T Tran
Y LeCun
Publication venue
Publication date: 01/01/2016
Field of study

Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy.Comment: Accepted at JBI under the new name: "Predicting healthcare trajectories from medical records: A deep learning approach

arXiv.org e-Print Archive

Deakin Research Online

Crossref