15 research outputs found
Diseño de algoritmos de clusterización para la regularización de redes neuronales y aprendizaje de características relevantes
En este trabajo, exploramos técnicas novedosas de 'representation learning'. Analizamos la red de cápsulas recientemente introducida y sus métodos de regularización. Presentamos una técnica de visualización de información en redes neuronales convolucionales en la que superponemos a las activaciones espaciales sus correspondientes campos receptivos. Esta nos permite ver los factores en función de los cuales nuestra red separa la información. Proponemos un nuevo método de clusterizado para las activaciones de la última capa de redes clasificadoras basado en un coste por margen. Demostramos su utilidad como método para obtener medidas robustas de incertidumbre sobre las decisiones que toma el clasificador. Adoptamos un marco probabilístico Bayesiano, proponiendo un algoritmo de autoencoder variacional novedoso. Al condicionar algunas variables latentes con valores discretos, conseguimos captar características de los datos distribuidas multimodalmente. Mostramos cómo este algoritmo permite obtener representaciones más desentrelazadas y de mayor calidad que los propuestos en la literatura de autoencoders variacionales. Proponemos un método para comparar la fidelidad de modelos generativos, entrenando un clasificador con bases de datos aumentadas con muestras generadas. Validamos experimentalmente que nuestro modelo consigue generar muestras nuevas más informativas que los modelos comparables de la literatura
Depth Uncertainty Networks for Active Learning
In active learning, the size and complexity of the training dataset changes
over time. Simple models that are well specified by the amount of data
available at the start of active learning might suffer from bias as more points
are actively sampled. Flexible models that might be well suited to the full
dataset can suffer from overfitting towards the start of active learning. We
tackle this problem using Depth Uncertainty Networks (DUNs), a BNN variant in
which the depth of the network, and thus its complexity, is inferred. We find
that DUNs outperform other BNN variants on several active learning tasks.
Importantly, we show that on the tasks in which DUNs perform best they present
notably less overfitting than baselines
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent
Gaussian processes are a powerful framework for quantifying uncertainty and
for sequential decision-making but are limited by the requirement of solving
linear systems. In general, this has a cubic cost in dataset size and is
sensitive to conditioning. We explore stochastic gradient algorithms as a
computationally efficient method of approximately solving these linear systems:
we develop low-variance optimization objectives for sampling from the posterior
and extend these to inducing points. Counterintuitively, stochastic gradient
descent often produces accurate predictions, even in cases where it does not
converge quickly to the optimum. We explain this through a spectral
characterization of the implicit bias from non-convergence. We show that
stochastic gradient descent produces predictive distributions close to the true
posterior both in regions with sufficient data coverage, and in regions
sufficiently far away from the data. Experimentally, stochastic gradient
descent achieves state-of-the-art performance on sufficiently large-scale or
ill-conditioned regression tasks. Its uncertainty estimates match the
performance of significantly more expensive baselines on a large-scale Bayesian
optimization task
Image Reconstruction via Deep Image Prior Subspaces
Deep learning has been widely used for solving image reconstruction tasks but
its deployability has been held back due to the shortage of high-quality
training data. Unsupervised learning methods, such as the deep image prior
(DIP), naturally fill this gap, but bring a host of new issues: the
susceptibility to overfitting due to a lack of robust early stopping strategies
and unstable convergence. We present a novel approach to tackle these issues by
restricting DIP optimisation to a sparse linear subspace of its parameters,
employing a synergy of dimensionality reduction techniques and second order
optimisation methods. The low-dimensionality of the subspace reduces DIP's
tendency to fit noise and allows the use of stable second order optimisation
methods, e.g., natural gradient descent or L-BFGS. Experiments across both
image restoration and tomographic tasks of different geometry and ill-posedness
show that second order optimisation within a low-dimensional subspace is
favourable in terms of optimisation stability to reconstruction fidelity
trade-off
SE(3) Equivariant Augmented Coupling Flows
Coupling normalizing flows allow for fast sampling and density evaluation,
making them the tool of choice for probabilistic modeling of physical systems.
However, the standard coupling architecture precludes endowing flows that
operate on the Cartesian coordinates of atoms with the SE(3) and permutation
invariances of physical systems. This work proposes a coupling flow that
preserves SE(3) and permutation equivariance by performing coordinate splits
along additional augmented dimensions. At each layer, the flow maps atoms'
positions into learned SE(3) invariant bases, where we apply standard flow
transformations, such as monotonic rational-quadratic splines, before returning
to the original basis. Crucially, our flow preserves fast sampling and density
evaluation, and may be used to produce unbiased estimates of expectations with
respect to the target distribution via importance sampling. When trained on the
DW4, LJ13 and QM9-positional datasets, our flow is competitive with equivariant
continuous normalizing flows, while allowing sampling two orders of magnitude
faster. Moreover, to the best of our knowledge, we are the first to learn the
full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian
positions of its atoms. Lastly, we demonstrate that our flow can be trained to
approximately sample from the Boltzmann distribution of the DW4 and LJ13
particle systems using only their energy functions
Stochastic Gradient Descent for Gaussian Processes Done Right
We study the optimisation problem associated with Gaussian process regression
using squared loss. The most common approach to this problem is to apply an
exact solver, such as conjugate gradient descent, either directly, or to a
reduced-order version of the problem. Recently, driven by successes in deep
learning, stochastic gradient descent has gained traction as an alternative. In
this paper, we show that when done right\unicode{x2014}by which we mean using
specific insights from the optimisation and kernel
communities\unicode{x2014}this approach is highly effective. We thus
introduce a particular stochastic dual gradient descent algorithm, that may be
implemented with a few lines of code using any deep learning framework. We
explain our design decisions by illustrating their advantage against
alternatives with ablation studies and show that the new method is highly
competitive. Our evaluations on standard regression benchmarks and a Bayesian
optimisation task set our approach apart from preconditioned conjugate
gradients, variational Gaussian process approximations, and a previous version
of stochastic gradient descent for Gaussian processes. On a molecular binding
affinity prediction task, our method places Gaussian process regression on par
in terms of performance with state-of-the-art graph neural networks
Evaluation of two treatment strategies for the prevention of preterm birth in women identified as at risk by ultrasound (PESAPRO Trial): Study protocol for a randomized controlled trial
Background: Premature birth is considered one of the main problems in modern Obstetrics. It causes more than
50 % of neonatal mortality; it is responsible for a large proportion of infant morbidity and incurs very high
economic costs. Cervical length, which can be accurately measured by ultrasound, has an inverse relationship with
the risk of preterm birth. As a result, having an effective intervention for asymptomatic patients with short cervix
could reduce the prematurity. Although recently published data demonstrates the effectiveness of vaginal
progesterone and cervical pessary, these treatments have never been compared to one another.
Methods/Design: The PESAPRO study is a noncommercial, multicenter, open-label, randomized clinical trial (RCT)
in pregnant women with a short cervix as identified by transvaginal ultrasonography at 19 to 22 weeks of
gestation. Patients are randomized (1:1) to either daily vaginal progesterone or cervical pessary until the 37th week
of gestation or delivery; whichever comes first. During the trial, women visit every 4 weeks for routine questions
and tests. The primary outcome is the proportion of spontaneous preterm deliveries before 34 weeks of gestation.
A sample size of 254 pregnant women will be included at 29 participating hospitals in order to demonstrate
noninferiority of placing a pessary versus vaginal progesterone. The first patient was randomized in August 2012,
and recruitment of study subjects will continue until the end of December 2015.
Discussion: This trial assesses the comparative efficacy and safety between two accepted treatments, cervical
pessary versus vaginal progesterone, and it will provide evidence in order to establish clinical recommendationsThe study has been funded by two national grants from the Spanish Ministry
of Health and ISCIII