Search CORE

367 research outputs found

Probabilistic Numerical Linear Algebra for Machine Learning

Author: Wenger Jonathan
Publication venue: Universität Tübingen
Publication date: 01/01/2023
Field of study

Machine learning models are becoming increasingly essential in domains where critical decisions must be made under uncertainty, such as in public policy, medicine or robotics. For a model to be useful for decision-making, it must convey a degree of certainty in its predictions. Bayesian models are well-suited to such settings due to their principled uncertainty quantification, given a set of assumptions about the problem and data-generating process. While in theory, inference in a Bayesian model is fully specified, in practice, numerical approximations have a significant impact on the resulting posterior. Therefore, model-based decisions are not just determined by the data but also by the numerical method. This begs the question of how we can account for the adverse impact of numerical approximations on inference. Arguably, the most common numerical task in scientific computing is the solution of linear systems, which arise in probabilistic inference, graph theory, differential equations and optimization. In machine learning, these systems are typically large-scale, subject to noise and arise from generative processes. These unique characteristics call for specialized solvers. In this thesis, we propose a class of probabilistic linear solvers, which infer the solution to a linear system and can be interpreted as learning algorithms themselves. Importantly, they can leverage problem structure and propagate their error to the prediction of the underlying probabilistic model. Next, we apply such solvers to accelerate Gaussian process inference. While Gaussian processes are a principled and flexible model class, for large datasets inference is computationally prohibitive both in time and memory due to the required computations with the kernel matrix. We show that by approximating the posterior with a probabilistic linear solver, we can invest an arbitrarily small amount of computation and still obtain a provably coherent prediction that quantifies uncertainty exactly. Finally, we demonstrate that Gaussian process hyperparameter optimization can similarly be accelerated by leveraging structural prior knowledge in the model via preconditioning of iterative methods. Combined with modern parallel hardware, this enables training Gaussian process models on datasets with hundreds of thousands of data points. In summary, we demonstrate that interpreting numerical methods in linear algebra as probabilistic learning algorithms unlocks significant performance improvements for Gaussian process models. Crucially, we show how to account for the impact of numerical approximations on model predictions via uncertainty quantification. This enables an explicit trade-off between computational resources and confidence in a prediction. The techniques developed in this thesis have advanced the understanding of probabilistic linear solvers, they have shifted the goalposts of what can be expected from Gaussian process approximations and they have defined the way large-scale Gaussian process hyperparameter optimization is performed in GPyTorch, arguably the most popular library for Gaussian processes in Python

Publikationsserver der Universität Tübingen

Accelerating Generalized Linear Models by Trading off Computation for Uncertainty

Author: Hennig Philipp
Schneider Frank
Tatzel Lukas
Wenger Jonathan
Publication venue
Publication date: 31/10/2023
Field of study

Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training by explicitly trading off reduced computation for increased uncertainty.Comment: Main text: 10 pages, 6 figures; Supplements: 13 pages, 2 figure

arXiv.org e-Print Archive

Chemistry with Photons, Protons, and Electrons

Author: Freys Jonathan C.
Hanss David
Walther Mathieu E.
Wenger Oliver S.
Publication venue: 'Swiss Chemical Society'
Publication date: 25/02/2009
Field of study

This is an account of the research activities of our group during the first two years of its existence. First results from our work on proton-coupled electron transfer and long-range charge tunneling reactions are presented. This includes a hydrogen-bonded cation–anion pair in which a proton-coupled electron transfer process can be phototriggered and followed by simple optical spectroscopic means, as well as a series of rigid rod-like donor-bridge-acceptor molecules which we use to investigate physical phenomena associated with the tunneling of electrons or holes. A unifying feature of this research is the use of light (photons) to induce proton and/or electron transfer

CHIMIA

Large-Scale Gaussian Processes via Alternating Projection

Author: Gardner Jacob R.
Jones Haydn
Pleiss Geoff
Wenger Jonathan
Wu Kaiwen
Publication venue
Publication date: 26/10/2023
Field of study

Gaussian process (GP) hyperparameter optimization requires repeatedly solving linear systems with

n \times n

kernel matrices. To address the prohibitive

\mathcal{O}(n^3)

time complexity, recent work has employed fast iterative numerical methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the corresponding kernel matrices become increasingly ill-conditioned and still require

\mathcal{O}(n^2)

space without partitioning. Thus, while CG increases the size of datasets GPs can be trained on, modern datasets reach scales beyond its applicability. In this work, we propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling \emph{mini-batching}. Our algorithm, based on alternating projection, has

\mathcal{O}(n)

per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets. Theoretically, we prove our method enjoys linear convergence and empirically we demonstrate its robustness to ill-conditioning. On large-scale benchmark datasets up to four million datapoints our approach accelerates training by a factor of 2

\times

to 27

\times

compared to CG

arXiv.org e-Print Archive

Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning

Author: Cunningham John P.
Gardner Jacob R.
Hennig Philipp
Pleiss Geoff
Wenger Jonathan
Publication venue
Publication date: 01/07/2021
Field of study

Gaussian processes remain popular as a flexible and expressive model class, but the computational cost of kernel hyperparameter optimization stands as a major limiting factor to their scaling and broader adoption. Recent work has made great strides combining stochastic estimation with iterative numerical techniques, essentially boiling down GP inference to the cost of (many) matrix-vector multiplies. Preconditioning -- a highly effective step for any iterative method involving matrix-vector multiplication -- can be used to accelerate convergence and thus reduce bias in hyperparameter optimization. Here, we prove that preconditioning has an additional benefit that has been previously unexplored. It not only reduces the bias of the

\log

-marginal likelihood estimator and its derivatives, but it also simultaneously can reduce variance at essentially negligible cost. We leverage this result to derive sample-efficient algorithms for GP hyperparameter optimization requiring as few as

\mathcal{O}(\log(\varepsilon^{-1}))

instead of

\mathcal{O}(\varepsilon^{-2})

samples to achieve error

\varepsilon

. Our theoretical results enable provably efficient and scalable optimization of kernel hyperparameters, which we validate empirically on a set of large-scale benchmark problems. There, variance reduction via preconditioning results in an order of magnitude speedup in hyperparameter optimization of exact GPs

arXiv.org e-Print Archive

Recommended from our members

Identification of candidate genes affecting Delta9-tetrahydrocannabinol biosynthesis in Cannabis sativa.

Author: Dixon Richard A
Gang David R
He Ji
Marks M David
Omburo Stephanie N
Soto-Fuentes Wilfredo
Tian Li
Weiblen George D
Wenger Jonathan P
Publication venue: eScholarship, University of California
Publication date: 01/01/2009
Field of study

RNA isolated from the glands of a Delta(9)-tetrahydrocannabinolic acid (THCA)-producing strain of Cannabis sativa was used to generate a cDNA library containing over 100 000 expressed sequence tags (ESTs). Sequencing of over 2000 clones from the library resulted in the identification of over 1000 unigenes. Candidate genes for almost every step in the biochemical pathways leading from primary metabolites to THCA were identified. Quantitative PCR analysis suggested that many of the pathway genes are preferentially expressed in the glands. Hexanoyl-CoA, one of the metabolites required for THCA synthesis, could be made via either de novo fatty acids synthesis or via the breakdown of existing lipids. qPCR analysis supported the de novo pathway. Many of the ESTs encode transcription factors and two putative MYB genes were identified that were preferentially expressed in glands. Given the similarity of the Cannabis MYB genes to those in other species with known functions, these Cannabis MYBs may play roles in regulating gland development and THCA synthesis. Three candidates for the polyketide synthase (PKS) gene responsible for the first committed step in the pathway to THCA were characterized in more detail. One of these was identical to a previously reported chalcone synthase (CHS) and was found to have CHS activity. All three could use malonyl-CoA and hexanoyl-CoA as substrates, including the CHS, but reaction conditions were not identified that allowed for the production of olivetolic acid (the proposed product of the PKS activity needed for THCA synthesis). One of the PKS candidates was highly and specifically expressed in glands (relative to whole leaves) and, on the basis of these expression data, it is proposed to be the most likely PKS responsible for olivetolic acid synthesis in Cannabis glands

eScholarship - University of California

Gas/particle partitioning of carbonyls in the photooxidation of isoprene and 1,3,5-trimethylbenzene

Author: Dommen Josef
Duplissy Jonathan
Healy Robert M.
Kalberer Markus
Metzger Axel
Wenger John C.
Publication venue: 'Copernicus GmbH'
Publication date: 06/06/2014
Field of study

A new denuder-filter sampling technique has been used to investigate the gas/particle partitioning behaviour of the carbonyl products from the photooxidation of isoprene and 1,3,5-trimethylbenzene. A series of experiments was performed in two atmospheric simulation chambers at atmospheric pressure and ambient temperature in the presence of NOx and at a relative humidity of approximately 50%. The denuder and filter were both coated with the derivatizing agent O-(2,3,4,5,6-pentafluorobenzyl)-hydroxylamine (PFBHA) to enable the efficient collection of gas- and particle-phase carbonyls respectively. The tubes and filters were extracted and carbonyls identified as their oxime derivatives by GC-MS. The carbonyl products identified in the experiments accounted for around 5% and 10% of the mass of secondary organic aerosol formed from the photooxidation of isoprene and 1,3,5-trimethylbenzene respectively. Experimental gas/particle partitioning coefficients were determined for a wide range of carbonyl products formed from the photooxidation of isoprene and 1,3,5-trimethylbenzene and compared with the theoretical values based on standard absorptive partitioning theory. Photooxidation products with a single carbonyl moiety were not observed in the particle phase, but dicarbonyls, and in particular, glyoxal and methylglyoxal, exhibited gas/particle partitioning coefficients several orders of magnitude higher than expected theoretically. These findings support the importance of heterogeneous and particle-phase chemical reactions for SOA formation and growth during the atmospheric degradation of anthropogenic and biogenic hydrocarbons

Cork Open Research Archive

Book Reviews

Author: Paulus-Jagric Deborah
Pratter Jonathan
Rao Sunil
Rasmussen Scott
Rumsey Mary
Wenger Jean M.
Publication venue: Scholarship@Cornell Law: A Digital Repository
Publication date: 18/09/2009
Field of study

Scholarship @ Cornell Law

ELLIPSIS AS A MARKER OF INTERACTION IN SPOKEN DISCOURSE

Author: Chun
Chun
Cogo
Cogo
Cogo
Cogo
Corver
Corver
Darhower
Darhower
Fernández
Fernández
Garcia
Garcia
Holly
Holly
John
John
Jonathan White
Kramsch
Kramsch
Lave
Lave
Lobeck
Lobeck
Mauranen
Mauranen
Mauranen
Mauranen
Merchant
Merchant
Merchant
Merchant
Merchant
Merchant
Morita
Morita
Peterson
Peterson
Pölzl
Pölzl
Ross
Ross
Sanders
Sanders
Scollon
Scollon
Scott
Scott
Stainton
Stainton
Sun
Sun
Suviniitty
Suviniitty
Wang
Wang
Wenger
Wenger
Westman
Westman
White
White
White
White
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2013
Field of study

In this article, we discuss strategies for interaction in spoken discourse, focusing on ellipsis phenomena in English. The data comes from the VOICE corpus of English as a Lingua Franca, and we analyse education data in the form of seminar and workshop discussions, working group meetings, interviews and conversations. The functions ellipsis carries in the data are Intersubjectivity, where participants develop and maintain an understanding in discourse; Continuers, which are examples of back channel support; Correction, both self- and other-initiated; Repetition; and Comments, which are similar to Continuers but do not have a back channel support function. We see that the first of these, Intersubjectivity, is by far the most popular, followed by Repetitions and Comments. These results are explained as consequences of the nature of the texts themselves, as some are discussions of presentations and so can be expected to contain many Repetitions, for example. The speech event is also an important factor, as events with asymmetrical power relations like interviews do not contain so many Continuers. Our clear conclusion is that the use of ellipsis is a strong marker of interaction in spoken discourse

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Dalarna University College Electronic Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)