Search CORE

62 research outputs found

In Memoriam : Elart von Collan

Author: Dimitrov Boyan N.
Publication venue: Digital Commons @ Kettering University
Publication date: 01/06/2017
Field of study

Evaluating Probabilistic Classifiers: The Triptych

Author: Dimitriadis Timo
Gneiting Tilmann
Jordan Alexander I.
Vogel Peter
Publication venue
Publication date: 25/01/2023
Field of study

Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance: The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value. A Murphy curve shows a forecast's mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm based) approach to craft reliability diagrams and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the DSC measure of discrimination ability versus the calibration metric MCB visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science

arXiv.org e-Print Archive

Estimating the Rate-Distortion Function by Wasserstein Gradient Descent

Author: Eckstein Stephan
Mandt Stephan
Nutz Marcel
Yang Yibo
Publication venue
Publication date: 29/10/2023
Field of study

In the theory of lossy compression, the rate-distortion (R-D) function

R(D)

describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion). Obtaining

R(D)

for a given data source establishes the fundamental performance limit for all compression algorithms. We propose a new method to estimate

R(D)

from the perspective of optimal transport. Unlike the classic Blahut--Arimoto algorithm which fixes the support of the reproduction distribution in advance, our Wasserstein gradient descent algorithm learns the support of the optimal reproduction distribution by moving particles. We prove its local convergence and analyze the sample complexity of our R-D estimator based on a connection to entropic optimal transport. Experimentally, we obtain comparable or tighter bounds than state-of-the-art neural network methods on low-rate sources while requiring considerably less tuning and computation effort. We also highlight a connection to maximum-likelihood deconvolution and introduce a new class of sources that can be used as test cases with known solutions to the R-D problem.Comment: Accepted as conference paper at NeurIPS 202

arXiv.org e-Print Archive

Statistical and Computational Aspects of Learning with Complex Structure

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2019
Field of study

The recent explosion of data that is routinely collected has led scientists to contemplate more and more sophisticated structural assumptions. Understanding how to harness and exploit such structure is key to improving the prediction accuracy of various statistical procedures. The ultimate goal of this line of research is to develop a set of tools that leverage underlying complex structures to pool information across observations and ultimately improve statistical accuracy as well as computational efficiency of the deployed methods. The workshop focused on recent developments in regression and matrix estimation under various complex constraints such as physical, computational, privacy, sparsity or robustness. Optimal-transport based techniques for geometric data analysis were also a main topic of the workshop

Repositorium für Naturwissenschaften und Technik

The Impact of an Instructional Intervention Designed to Support Development of Stochastic Understanding of Probability Distribution

Author: Conant Darcy Lynn
Publication venue
Publication date: 01/01/2013
Field of study

Stochastic understanding of probability distribution undergirds development of conceptual connections between probability and statistics and supports development of a principled understanding of statistical inference. This study investigated the impact of an instructional course intervention designed to support development of stochastic understanding of probability distribution. Instructional supports consisted of supplemental lab assignments comprised of anticipatory tasks designed to engage students in coordinating thinking about complementary probabilistic and statistical notions. These tasks utilized dynamic software simulations to elicit stochastic conceptions and to support development of conceptual connections between empirical distributions and theoretical probability distribution models along a hypothetical learning trajectory undergirding stochastic understanding of probability distribution. The study employed a treatment-control design, using a mix of quantitative and qualitative research methods to examine students' understanding after a one-semester course. Participants were 184 undergraduate students enrolled in a lecture/recitation, calculus-based, introductory probability and statistics course who completed lab assignments addressing either calculus review (control) or stochastic conceptions of probability distribution (treatment). Data sources consisted of a student background survey, a conceptual assessment, ARTIST assessment items, and final course examinations. Student interviews provided insight into the nature of students' reasoning and facilitated examination of validity of the stochastic conceptual assessment. Logistic regression analysis revealed completion of supplemental assignments designed to undergird development of stochastic conceptions had a statistically significant impact on students' understanding of probability distribution. Students who held stochastic conceptions indicated integrated reasoning related to probability, variability, and distribution and presented images which support a principled understanding of statistical inference

Digital Repository at the University of Maryland

CWI Self-evaluation 1999-2004

Author: CWI CWI
Publication venue: Stichting Centrum voor Wiskunde en Informatica
Publication date: 01/01/2005
Field of study

CWI's Institutional Repository

Discrete Parameter Estimation for Rare Events: From Binomial to Extreme Value Distributions

Author: Schneider Laura Fee
Publication venue
Publication date: 26/04/2019
Field of study

Georg-August-University Göttingen

Modelling of Viral Disease Risk

Author: Hahn Nico
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 01/01/2021
Field of study

Covid-19 has had a significant impact on daily life since the initial outbreak of the global pandemic in late 2019. Countries have been affected to varying degrees, depending on government actions and country characteristics such as infrastructure and demographics. Using Norway and Germany as a case study, this thesis aims to determine which factors influence the risk of infection in each country, using Bayesian modelling and a non-Bayesian machine learning approach. Specifically, the relationship between infection rates and demographic and infrastructural characteristics in a municipality at a fixed point in time is investigated and the effectiveness of a Bayesian model in this context is compared with a machine learning algorithm. In addition, temporal modelling is used to assess the usefulness of government interventions, the impact of changes in mobility behaviour and the prevalence of different strains of Covid-19 in relation to infection numbers. The results show that a spatial model is more useful than a machine learning model in this context. For Germany, it is found that the logarithmic trade tax in a municipality, the share of the vote for the right-wing AfD party and the population density have a positive influence on the infection figures. For Norway, the number of immigrants in a municipality, the number of unemployed immigrants in a municipality and population density are found to have a positive association with infection rates, while the proportion of women in a municipality is negatively associated with infection rates. The temporal models identify higher workplace mobility as a factor significantly influencing the risk of infection in Germany and Norway

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Annual Research Report 2002

Author: Weierstrass-Institut für Angewandte Analysis und Stochastik (Berlin)
Publication venue
Publication date: 01/01/2002
Field of study

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics