1,560 research outputs found
Speeding Up BatchBALD: A k-BALD Family of Approximations for Active Learning
Active learning is a powerful method for training machine learning models
with limited labeled data. One commonly used technique for active learning is
BatchBALD, which uses Bayesian neural networks to find the most informative
points to label in a pool set. However, BatchBALD can be very slow to compute,
especially for larger datasets. In this paper, we propose a new approximation,
k-BALD, which uses k-wise mutual information terms to approximate BatchBALD,
making it much less expensive to compute. Results on the MNIST dataset show
that k-BALD is significantly faster than BatchBALD while maintaining similar
performance. Additionally, we also propose a dynamic approach for choosing k
based on the quality of the approximation, making it more efficient for larger
datasets.Comment: 5 pages, workshop preprin
Inverse problems for abstract evolution equations II: higher order differentiability for viscoelasticity
Abstract. In this follow-up of [Inverse Problems 32 (2016) 085001] we generalize our previous abstract results so that they can be applied to the viscoelastic wave equation which serves as a forward model for full waveform inversion (FWI) in seismic imaging including dispersion and attenuation. FWI is the nonlinear inverse problem of identifying parameter functions of the viscoelastic wave equation from measurements of the reflected wave field. Here we rigorously derive rather explicit analytic expressions for the Fréchet derivative and its adjoint (adjoint state method) of the underlying parameter-to-solution map. These quantities enter crucially Newton-like gradient decent solvers for FWI. Moreover, we provide the second Fréchet derivative and a related adjoint as ingredients to second degree solvers
Black-Box Batch Active Learning for Regression
Batch active learning is a popular approach for efficiently training machine
learning models on large, initially unlabelled datasets by repeatedly acquiring
labels for batches of data points. However, many recent batch active learning
methods are white-box approaches and are often limited to differentiable
parametric models: they score unlabeled points using acquisition functions
based on model embeddings or first- and second-order derivatives. In this
paper, we propose black-box batch active learning for regression tasks as an
extension of white-box approaches. Crucially, our method only relies on model
predictions. This approach is compatible with a wide range of machine learning
models, including regular and Bayesian deep learning models and
non-differentiable models such as random forests. It is rooted in Bayesian
principles and utilizes recent kernel-based approaches. This allows us to
extend a wide range of existing state-of-the-art white-box batch active
learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the
effectiveness of our approach through extensive experimental evaluations on
regression datasets, achieving surprisingly strong performance compared to
white-box approaches for deep learning models.Comment: 12 pages + 11 pages appendi
Advanced deep active learning & data subset selection: unifying principles with information-theory intuitions
At its core, this thesis aims to enhance the practicality of deep learning by improving the label and training efficiency of deep learning models.
To this end, we investigate data subset selection techniques, specifically active learning and active sampling, grounded in information-theoretic principles.
Active learning improves label efficiency, while active sampling enhances training efficiency.
Supervised deep learning models often require extensive training with labeled data. Label acquisition can be expensive and time-consuming, and training large models is resource-intensive, hindering the adoption outside academic research and "big tech."
Existing methods for data subset selection in deep learning often rely on heuristics or lack a principled information-theoretic foundation. In contrast, this thesis examines several objectives for data subset selection and their applications within deep learning, striving for a more principled approach inspired by information theory.
We begin by disentangling epistemic and aleatoric uncertainty in single forward-pass deep neural networks, which provides helpful intuitions and insights into different forms of uncertainty and their relevance for data subset selection. We then propose and investigate various approaches for active learning and data subset selection in (Bayesian) deep learning. Finally, we relate various existing and proposed approaches to approximations of information quantities in weight or prediction space.
Underpinning this work is a principled and practical notation for information-theoretic quantities that includes both random variables and observed outcomes. This thesis demonstrates the benefits of working from a unified perspective and highlights the potential impact of our contributions to the practical application of deep learning
Does "Deep Learning on a Data Diet" reproduce? Overall yes, but GraNd at Initialization does not
The paper 'Deep Learning on a Data Diet' by Paul et al. (2021) introduces two
innovative metrics for pruning datasets during the training of neural networks.
While we are able to replicate the results for the EL2N score at epoch 20, the
same cannot be said for the GraNd score at initialization. The GraNd scores
later in training provide useful pruning signals, however. The GraNd score at
initialization calculates the average gradient norm of an input sample across
multiple randomly initialized models before any training has taken place. Our
analysis reveals a strong correlation between the GraNd score at initialization
and the input norm of a sample, suggesting that the latter could have been a
cheap new baseline for data pruning. Unfortunately, neither the GraNd score at
initialization nor the input norm surpasses random pruning in performance. This
contradicts one of the findings in Paul et al. (2021). We were unable to
reproduce their CIFAR-10 results using both an updated version of the original
JAX repository and in a newly implemented PyTorch codebase. An investigation of
the underlying JAX/FLAX code from 2021 surfaced a bug in the checkpoint
restoring code that was fixed in April 2021
(https://github.com/google/flax/commit/28fbd95500f4bf2f9924d2560062fa50e919b1a5).Comment: 5 page
A scattering problem for a local pertubation of an open periodic waveguide
In this paper we consider the propagation of waves in an open waveguide in where the index of refraction is a local perturbation of a function which is periodic along the axis of the waveguide (which we choose to be the −axis) and equal to one for for some . Motivated by the limiting absorption principle (proven in [17] for the case of an open waveguide in the half space we formulate a radiation condition which allows the existence of propagating modes and prove uniqueness, existence, and stability of a solution under the assumption that no bound states exist. In the second part we determine the order of decay of the radiating part of the solution in the direction of the layer and in the direction orthogonal to it. Finally, we show that it satisfies the classical Sommerfeld radiation condition and allows the definition of a far field pattern
On the scattering of a plane wave by a perturbed open periodic waveguide
We consider the scattering of a plane wave by a locally perturbed periodic (with respect to ) medium. If there is no perturbation it is usually assumed that the scattered wave is quasi-periodic with the same parameter as the incident plane wave. As it is well known, one can show existence under this condition but not necessarily uniqueness. Uniqueness fails for certain incident directions (if the wavenumber is kept fixed), and it is not clear which additional condition has to be assumed in this case. In this paper we will analyze three concepts. For the Limiting Absorption Principle (LAP) we replace the refractive index by in a layer of finite width and consider the limiting case . This will give an unsatisfactory condition. In a second approach we require continuity of the field with respect to the incident direction. This will give the same satisfactory condition as the third approach where we approximate the incident plane wave by an incident point source and let the location of the source tend to infinity
Direct and inverse time-harmonic scattering by Dirichlet periodic curves with local perturbations
This is a continuation of the authors' previous work (A. Kirsch, Math. Meth.
Appl. Sci., 45 (2022): 5737-5773.) on well-posedness of time-harmonic
scattering by locally perturbed periodic curves of Dirichlet kind. The
scattering interface is supposed to be given by a non-self-intersecting
Lipschitz curve. We study properties of the Green's function and prove new
well-posedness results for scattering of plane waves at a propagative wave
number. In such a case there exist guided waves to the unperturbed problem,
which are also known as Bounded States in the Continuity (BICs) in physics. In
this paper uniqueness of the forward scattering follows from an orthogonal
constraint condition enforcing on the total field to the unperturbed scattering
problem. This constraint condition, which is also valid under the Neumann
boundary condition, is derived from the singular perturbation arguments and
also from the approach of approximating a plane wave by point source waves. For
the inverse problem of determining the defect, we prove several uniqueness
results using a finite or infinite number of point source and plane waves,
depending on whether a priori information on the size and height of the defect
is available
- …