15 research outputs found

    Diseño de algoritmos de clusterización para la regularización de redes neuronales y aprendizaje de características relevantes

    Get PDF
    En este trabajo, exploramos técnicas novedosas de 'representation learning'. Analizamos la red de cápsulas recientemente introducida y sus métodos de regularización. Presentamos una técnica de visualización de información en redes neuronales convolucionales en la que superponemos a las activaciones espaciales sus correspondientes campos receptivos. Esta nos permite ver los factores en función de los cuales nuestra red separa la información. Proponemos un nuevo método de clusterizado para las activaciones de la última capa de redes clasificadoras basado en un coste por margen. Demostramos su utilidad como método para obtener medidas robustas de incertidumbre sobre las decisiones que toma el clasificador. Adoptamos un marco probabilístico Bayesiano, proponiendo un algoritmo de autoencoder variacional novedoso. Al condicionar algunas variables latentes con valores discretos, conseguimos captar características de los datos distribuidas multimodalmente. Mostramos cómo este algoritmo permite obtener representaciones más desentrelazadas y de mayor calidad que los propuestos en la literatura de autoencoders variacionales. Proponemos un método para comparar la fidelidad de modelos generativos, entrenando un clasificador con bases de datos aumentadas con muestras generadas. Validamos experimentalmente que nuestro modelo consigue generar muestras nuevas más informativas que los modelos comparables de la literatura

    Depth Uncertainty Networks for Active Learning

    Full text link
    In active learning, the size and complexity of the training dataset changes over time. Simple models that are well specified by the amount of data available at the start of active learning might suffer from bias as more points are actively sampled. Flexible models that might be well suited to the full dataset can suffer from overfitting towards the start of active learning. We tackle this problem using Depth Uncertainty Networks (DUNs), a BNN variant in which the depth of the network, and thus its complexity, is inferred. We find that DUNs outperform other BNN variants on several active learning tasks. Importantly, we show that on the tasks in which DUNs perform best they present notably less overfitting than baselines

    Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

    Full text link
    Gaussian processes are a powerful framework for quantifying uncertainty and for sequential decision-making but are limited by the requirement of solving linear systems. In general, this has a cubic cost in dataset size and is sensitive to conditioning. We explore stochastic gradient algorithms as a computationally efficient method of approximately solving these linear systems: we develop low-variance optimization objectives for sampling from the posterior and extend these to inducing points. Counterintuitively, stochastic gradient descent often produces accurate predictions, even in cases where it does not converge quickly to the optimum. We explain this through a spectral characterization of the implicit bias from non-convergence. We show that stochastic gradient descent produces predictive distributions close to the true posterior both in regions with sufficient data coverage, and in regions sufficiently far away from the data. Experimentally, stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks. Its uncertainty estimates match the performance of significantly more expensive baselines on a large-scale Bayesian optimization task

    Image Reconstruction via Deep Image Prior Subspaces

    Full text link
    Deep learning has been widely used for solving image reconstruction tasks but its deployability has been held back due to the shortage of high-quality training data. Unsupervised learning methods, such as the deep image prior (DIP), naturally fill this gap, but bring a host of new issues: the susceptibility to overfitting due to a lack of robust early stopping strategies and unstable convergence. We present a novel approach to tackle these issues by restricting DIP optimisation to a sparse linear subspace of its parameters, employing a synergy of dimensionality reduction techniques and second order optimisation methods. The low-dimensionality of the subspace reduces DIP's tendency to fit noise and allows the use of stable second order optimisation methods, e.g., natural gradient descent or L-BFGS. Experiments across both image restoration and tomographic tasks of different geometry and ill-posedness show that second order optimisation within a low-dimensional subspace is favourable in terms of optimisation stability to reconstruction fidelity trade-off

    SE(3) Equivariant Augmented Coupling Flows

    Full text link
    Coupling normalizing flows allow for fast sampling and density evaluation, making them the tool of choice for probabilistic modeling of physical systems. However, the standard coupling architecture precludes endowing flows that operate on the Cartesian coordinates of atoms with the SE(3) and permutation invariances of physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms' positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13 and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows, while allowing sampling two orders of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions

    Stochastic Gradient Descent for Gaussian Processes Done Right

    Full text link
    We study the optimisation problem associated with Gaussian process regression using squared loss. The most common approach to this problem is to apply an exact solver, such as conjugate gradient descent, either directly, or to a reduced-order version of the problem. Recently, driven by successes in deep learning, stochastic gradient descent has gained traction as an alternative. In this paper, we show that when done right\unicode{x2014}by which we mean using specific insights from the optimisation and kernel communities\unicode{x2014}this approach is highly effective. We thus introduce a particular stochastic dual gradient descent algorithm, that may be implemented with a few lines of code using any deep learning framework. We explain our design decisions by illustrating their advantage against alternatives with ablation studies and show that the new method is highly competitive. Our evaluations on standard regression benchmarks and a Bayesian optimisation task set our approach apart from preconditioned conjugate gradients, variational Gaussian process approximations, and a previous version of stochastic gradient descent for Gaussian processes. On a molecular binding affinity prediction task, our method places Gaussian process regression on par in terms of performance with state-of-the-art graph neural networks

    Evaluation of two treatment strategies for the prevention of preterm birth in women identified as at risk by ultrasound (PESAPRO Trial): Study protocol for a randomized controlled trial

    Full text link
    Background: Premature birth is considered one of the main problems in modern Obstetrics. It causes more than 50 % of neonatal mortality; it is responsible for a large proportion of infant morbidity and incurs very high economic costs. Cervical length, which can be accurately measured by ultrasound, has an inverse relationship with the risk of preterm birth. As a result, having an effective intervention for asymptomatic patients with short cervix could reduce the prematurity. Although recently published data demonstrates the effectiveness of vaginal progesterone and cervical pessary, these treatments have never been compared to one another. Methods/Design: The PESAPRO study is a noncommercial, multicenter, open-label, randomized clinical trial (RCT) in pregnant women with a short cervix as identified by transvaginal ultrasonography at 19 to 22 weeks of gestation. Patients are randomized (1:1) to either daily vaginal progesterone or cervical pessary until the 37th week of gestation or delivery; whichever comes first. During the trial, women visit every 4 weeks for routine questions and tests. The primary outcome is the proportion of spontaneous preterm deliveries before 34 weeks of gestation. A sample size of 254 pregnant women will be included at 29 participating hospitals in order to demonstrate noninferiority of placing a pessary versus vaginal progesterone. The first patient was randomized in August 2012, and recruitment of study subjects will continue until the end of December 2015. Discussion: This trial assesses the comparative efficacy and safety between two accepted treatments, cervical pessary versus vaginal progesterone, and it will provide evidence in order to establish clinical recommendationsThe study has been funded by two national grants from the Spanish Ministry of Health and ISCIII
    corecore