1,560 research outputs found

    Speeding Up BatchBALD: A k-BALD Family of Approximations for Active Learning

    Full text link
    Active learning is a powerful method for training machine learning models with limited labeled data. One commonly used technique for active learning is BatchBALD, which uses Bayesian neural networks to find the most informative points to label in a pool set. However, BatchBALD can be very slow to compute, especially for larger datasets. In this paper, we propose a new approximation, k-BALD, which uses k-wise mutual information terms to approximate BatchBALD, making it much less expensive to compute. Results on the MNIST dataset show that k-BALD is significantly faster than BatchBALD while maintaining similar performance. Additionally, we also propose a dynamic approach for choosing k based on the quality of the approximation, making it more efficient for larger datasets.Comment: 5 pages, workshop preprin

    Inverse problems for abstract evolution equations II: higher order differentiability for viscoelasticity

    Get PDF
    Abstract. In this follow-up of [Inverse Problems 32 (2016) 085001] we generalize our previous abstract results so that they can be applied to the viscoelastic wave equation which serves as a forward model for full waveform inversion (FWI) in seismic imaging including dispersion and attenuation. FWI is the nonlinear inverse problem of identifying parameter functions of the viscoelastic wave equation from measurements of the reflected wave field. Here we rigorously derive rather explicit analytic expressions for the Fréchet derivative and its adjoint (adjoint state method) of the underlying parameter-to-solution map. These quantities enter crucially Newton-like gradient decent solvers for FWI. Moreover, we provide the second Fréchet derivative and a related adjoint as ingredients to second degree solvers

    Black-Box Batch Active Learning for Regression

    Full text link
    Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets by repeatedly acquiring labels for batches of data points. However, many recent batch active learning methods are white-box approaches and are often limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. Crucially, our method only relies on model predictions. This approach is compatible with a wide range of machine learning models, including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.Comment: 12 pages + 11 pages appendi

    Advanced deep active learning & data subset selection: unifying principles with information-theory intuitions

    Get PDF
    At its core, this thesis aims to enhance the practicality of deep learning by improving the label and training efficiency of deep learning models. To this end, we investigate data subset selection techniques, specifically active learning and active sampling, grounded in information-theoretic principles. Active learning improves label efficiency, while active sampling enhances training efficiency. Supervised deep learning models often require extensive training with labeled data. Label acquisition can be expensive and time-consuming, and training large models is resource-intensive, hindering the adoption outside academic research and "big tech." Existing methods for data subset selection in deep learning often rely on heuristics or lack a principled information-theoretic foundation. In contrast, this thesis examines several objectives for data subset selection and their applications within deep learning, striving for a more principled approach inspired by information theory. We begin by disentangling epistemic and aleatoric uncertainty in single forward-pass deep neural networks, which provides helpful intuitions and insights into different forms of uncertainty and their relevance for data subset selection. We then propose and investigate various approaches for active learning and data subset selection in (Bayesian) deep learning. Finally, we relate various existing and proposed approaches to approximations of information quantities in weight or prediction space. Underpinning this work is a principled and practical notation for information-theoretic quantities that includes both random variables and observed outcomes. This thesis demonstrates the benefits of working from a unified perspective and highlights the potential impact of our contributions to the practical application of deep learning

    Does "Deep Learning on a Data Diet" reproduce? Overall yes, but GraNd at Initialization does not

    Full text link
    The paper 'Deep Learning on a Data Diet' by Paul et al. (2021) introduces two innovative metrics for pruning datasets during the training of neural networks. While we are able to replicate the results for the EL2N score at epoch 20, the same cannot be said for the GraNd score at initialization. The GraNd scores later in training provide useful pruning signals, however. The GraNd score at initialization calculates the average gradient norm of an input sample across multiple randomly initialized models before any training has taken place. Our analysis reveals a strong correlation between the GraNd score at initialization and the input norm of a sample, suggesting that the latter could have been a cheap new baseline for data pruning. Unfortunately, neither the GraNd score at initialization nor the input norm surpasses random pruning in performance. This contradicts one of the findings in Paul et al. (2021). We were unable to reproduce their CIFAR-10 results using both an updated version of the original JAX repository and in a newly implemented PyTorch codebase. An investigation of the underlying JAX/FLAX code from 2021 surfaced a bug in the checkpoint restoring code that was fixed in April 2021 (https://github.com/google/flax/commit/28fbd95500f4bf2f9924d2560062fa50e919b1a5).Comment: 5 page

    A scattering problem for a local pertubation of an open periodic waveguide

    Get PDF
    In this paper we consider the propagation of waves in an open waveguide in R2\mathbb{R}^2 where the index of refraction is a local perturbation of a function which is periodic along the axis of the waveguide (which we choose to be the x1x_1−axis) and equal to one for x2>h0|x_2| > h_0 for some h0>0h_0 > 0. Motivated by the limiting absorption principle (proven in [17] for the case of an open waveguide in the half space R×(0,))\mathbb{R} \times (0, \infty)) we formulate a radiation condition which allows the existence of propagating modes and prove uniqueness, existence, and stability of a solution under the assumption that no bound states exist. In the second part we determine the order of decay of the radiating part of the solution in the direction of the layer and in the direction orthogonal to it. Finally, we show that it satisfies the classical Sommerfeld radiation condition and allows the definition of a far field pattern

    On the scattering of a plane wave by a perturbed open periodic waveguide

    Get PDF
    We consider the scattering of a plane wave by a locally perturbed periodic (with respect to x1x_1) medium. If there is no perturbation it is usually assumed that the scattered wave is quasi-periodic with the same parameter as the incident plane wave. As it is well known, one can show existence under this condition but not necessarily uniqueness. Uniqueness fails for certain incident directions (if the wavenumber is kept fixed), and it is not clear which additional condition has to be assumed in this case. In this paper we will analyze three concepts. For the Limiting Absorption Principle (LAP) we replace the refractive index n=n(x)n = n(x) by n(x)+iεn(x) + i\varepsilon in a layer of finite width and consider the limiting case ε0\varepsilon\to0. This will give an unsatisfactory condition. In a second approach we require continuity of the field with respect to the incident direction. This will give the same satisfactory condition as the third approach where we approximate the incident plane wave by an incident point source and let the location of the source tend to infinity

    Direct and inverse time-harmonic scattering by Dirichlet periodic curves with local perturbations

    Full text link
    This is a continuation of the authors' previous work (A. Kirsch, Math. Meth. Appl. Sci., 45 (2022): 5737-5773.) on well-posedness of time-harmonic scattering by locally perturbed periodic curves of Dirichlet kind. The scattering interface is supposed to be given by a non-self-intersecting Lipschitz curve. We study properties of the Green's function and prove new well-posedness results for scattering of plane waves at a propagative wave number. In such a case there exist guided waves to the unperturbed problem, which are also known as Bounded States in the Continuity (BICs) in physics. In this paper uniqueness of the forward scattering follows from an orthogonal constraint condition enforcing on the total field to the unperturbed scattering problem. This constraint condition, which is also valid under the Neumann boundary condition, is derived from the singular perturbation arguments and also from the approach of approximating a plane wave by point source waves. For the inverse problem of determining the defect, we prove several uniqueness results using a finite or infinite number of point source and plane waves, depending on whether a priori information on the size and height of the defect is available
    corecore