367 research outputs found

    Probabilistic Numerical Linear Algebra for Machine Learning

    Get PDF
    Machine learning models are becoming increasingly essential in domains where critical decisions must be made under uncertainty, such as in public policy, medicine or robotics. For a model to be useful for decision-making, it must convey a degree of certainty in its predictions. Bayesian models are well-suited to such settings due to their principled uncertainty quantification, given a set of assumptions about the problem and data-generating process. While in theory, inference in a Bayesian model is fully specified, in practice, numerical approximations have a significant impact on the resulting posterior. Therefore, model-based decisions are not just determined by the data but also by the numerical method. This begs the question of how we can account for the adverse impact of numerical approximations on inference. Arguably, the most common numerical task in scientific computing is the solution of linear systems, which arise in probabilistic inference, graph theory, differential equations and optimization. In machine learning, these systems are typically large-scale, subject to noise and arise from generative processes. These unique characteristics call for specialized solvers. In this thesis, we propose a class of probabilistic linear solvers, which infer the solution to a linear system and can be interpreted as learning algorithms themselves. Importantly, they can leverage problem structure and propagate their error to the prediction of the underlying probabilistic model. Next, we apply such solvers to accelerate Gaussian process inference. While Gaussian processes are a principled and flexible model class, for large datasets inference is computationally prohibitive both in time and memory due to the required computations with the kernel matrix. We show that by approximating the posterior with a probabilistic linear solver, we can invest an arbitrarily small amount of computation and still obtain a provably coherent prediction that quantifies uncertainty exactly. Finally, we demonstrate that Gaussian process hyperparameter optimization can similarly be accelerated by leveraging structural prior knowledge in the model via preconditioning of iterative methods. Combined with modern parallel hardware, this enables training Gaussian process models on datasets with hundreds of thousands of data points. In summary, we demonstrate that interpreting numerical methods in linear algebra as probabilistic learning algorithms unlocks significant performance improvements for Gaussian process models. Crucially, we show how to account for the impact of numerical approximations on model predictions via uncertainty quantification. This enables an explicit trade-off between computational resources and confidence in a prediction. The techniques developed in this thesis have advanced the understanding of probabilistic linear solvers, they have shifted the goalposts of what can be expected from Gaussian process approximations and they have defined the way large-scale Gaussian process hyperparameter optimization is performed in GPyTorch, arguably the most popular library for Gaussian processes in Python

    Accelerating Generalized Linear Models by Trading off Computation for Uncertainty

    Full text link
    Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training by explicitly trading off reduced computation for increased uncertainty.Comment: Main text: 10 pages, 6 figures; Supplements: 13 pages, 2 figure

    Chemistry with Photons, Protons, and Electrons

    Get PDF
    This is an account of the research activities of our group during the first two years of its existence. First results from our work on proton-coupled electron transfer and long-range charge tunneling reactions are presented. This includes a hydrogen-bonded cation–anion pair in which a proton-coupled electron transfer process can be phototriggered and followed by simple optical spectroscopic means, as well as a series of rigid rod-like donor-bridge-acceptor molecules which we use to investigate physical phenomena associated with the tunneling of electrons or holes. A unifying feature of this research is the use of light (photons) to induce proton and/or electron transfer

    Large-Scale Gaussian Processes via Alternating Projection

    Full text link
    Gaussian process (GP) hyperparameter optimization requires repeatedly solving linear systems with n×nn \times n kernel matrices. To address the prohibitive O(n3)\mathcal{O}(n^3) time complexity, recent work has employed fast iterative numerical methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the corresponding kernel matrices become increasingly ill-conditioned and still require O(n2)\mathcal{O}(n^2) space without partitioning. Thus, while CG increases the size of datasets GPs can be trained on, modern datasets reach scales beyond its applicability. In this work, we propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling \emph{mini-batching}. Our algorithm, based on alternating projection, has O(n)\mathcal{O}(n) per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets. Theoretically, we prove our method enjoys linear convergence and empirically we demonstrate its robustness to ill-conditioning. On large-scale benchmark datasets up to four million datapoints our approach accelerates training by a factor of 2×\times to 27×\times compared to CG

    Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning

    Full text link
    Gaussian processes remain popular as a flexible and expressive model class, but the computational cost of kernel hyperparameter optimization stands as a major limiting factor to their scaling and broader adoption. Recent work has made great strides combining stochastic estimation with iterative numerical techniques, essentially boiling down GP inference to the cost of (many) matrix-vector multiplies. Preconditioning -- a highly effective step for any iterative method involving matrix-vector multiplication -- can be used to accelerate convergence and thus reduce bias in hyperparameter optimization. Here, we prove that preconditioning has an additional benefit that has been previously unexplored. It not only reduces the bias of the log\log-marginal likelihood estimator and its derivatives, but it also simultaneously can reduce variance at essentially negligible cost. We leverage this result to derive sample-efficient algorithms for GP hyperparameter optimization requiring as few as O(log(ε1))\mathcal{O}(\log(\varepsilon^{-1})) instead of O(ε2)\mathcal{O}(\varepsilon^{-2}) samples to achieve error ε\varepsilon. Our theoretical results enable provably efficient and scalable optimization of kernel hyperparameters, which we validate empirically on a set of large-scale benchmark problems. There, variance reduction via preconditioning results in an order of magnitude speedup in hyperparameter optimization of exact GPs

    Gas/particle partitioning of carbonyls in the photooxidation of isoprene and 1,3,5-trimethylbenzene

    Get PDF
    A new denuder-filter sampling technique has been used to investigate the gas/particle partitioning behaviour of the carbonyl products from the photooxidation of isoprene and 1,3,5-trimethylbenzene. A series of experiments was performed in two atmospheric simulation chambers at atmospheric pressure and ambient temperature in the presence of NOx and at a relative humidity of approximately 50%. The denuder and filter were both coated with the derivatizing agent O-(2,3,4,5,6-pentafluorobenzyl)-hydroxylamine (PFBHA) to enable the efficient collection of gas- and particle-phase carbonyls respectively. The tubes and filters were extracted and carbonyls identified as their oxime derivatives by GC-MS. The carbonyl products identified in the experiments accounted for around 5% and 10% of the mass of secondary organic aerosol formed from the photooxidation of isoprene and 1,3,5-trimethylbenzene respectively. Experimental gas/particle partitioning coefficients were determined for a wide range of carbonyl products formed from the photooxidation of isoprene and 1,3,5-trimethylbenzene and compared with the theoretical values based on standard absorptive partitioning theory. Photooxidation products with a single carbonyl moiety were not observed in the particle phase, but dicarbonyls, and in particular, glyoxal and methylglyoxal, exhibited gas/particle partitioning coefficients several orders of magnitude higher than expected theoretically. These findings support the importance of heterogeneous and particle-phase chemical reactions for SOA formation and growth during the atmospheric degradation of anthropogenic and biogenic hydrocarbons

    Book Reviews

    Get PDF

    ELLIPSIS AS A MARKER OF INTERACTION IN SPOKEN DISCOURSE

    Get PDF
    In this article, we discuss strategies for interaction in spoken discourse, focusing on ellipsis phenomena in English. The data comes from the VOICE corpus of English as a Lingua Franca, and we analyse education data in the form of seminar and workshop discussions, working group meetings, interviews and conversations. The functions ellipsis carries in the data are Intersubjectivity, where participants develop and maintain an understanding in discourse; Continuers, which are examples of back channel support; Correction, both self- and other-initiated; Repetition; and Comments, which are similar to Continuers but do not have a back channel support function. We see that the first of these, Intersubjectivity, is by far the most popular, followed by Repetitions and Comments. These results are explained as consequences of the nature of the texts themselves, as some are discussions of presentations and so can be expected to contain many Repetitions, for example. The speech event is also an important factor, as events with asymmetrical power relations like interviews do not contain so many Continuers. Our clear conclusion is that the use of ellipsis is a strong marker of interaction in spoken discourse
    corecore