Search CORE

6,198 research outputs found

The Parameter Houlihan: a solution to high-throughput identifiability indeterminacy for brutally ill-posed problems

Author: Albers DJ
Hripcsak G
Levine M
Mamykina L
Publication venue
Publication date: 05/02/2019
Field of study

One way to interject knowledge into clinically impactful forecasting is to use data assimilation, a nonlinear regression that projects data onto a mechanistic physiologic model, instead of a set of functions, such as neural networks. Such regressions have an advantage of being useful with particularly sparse, non-stationary clinical data. However, physiological models are often nonlinear and can have many parameters, leading to potential problems with parameter identifiability, or the ability to find a unique set of parameters that minimize forecasting error. The identifiability problems can be minimized or eliminated by reducing the number of parameters estimated, but reducing the number of estimated parameters also reduces the flexibility of the model and hence increases forecasting error. We propose a method, the parameter Houlihan, that combines traditional machine learning techniques with data assimilation, to select the right set of model parameters to minimize forecasting error while reducing identifiability problems. The method worked well: the data assimilation-based glucose forecasts and estimates for our cohort using the Houlihan-selected parameter sets generally also minimize forecasting errors compared to other parameter selection methods such as by-hand parameter selection. Nevertheless, the forecast with the lowest forecast error does not always accurately represent physiology, but further advancements of the algorithm provide a path for improving physiologic fidelity as well. Our hope is that this methodology represents a first step toward combining machine learning with data assimilation and provides a lower-threshold entry point for using data assimilation with clinical data by helping select the right parameters to estimate

arXiv.org e-Print Archive

Caltech Authors

Stability

Author: Yu Bin
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/09/2013
Field of study

Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this article, a case is made for the importance of stability in statistics. Firstly, we motivate the necessity of stability for interpretable and reliable encoding models from brain fMRI signals. Secondly, we find strong evidence in the literature to demonstrate the central role of stability in statistical inference, such as sensitivity analysis and effect detection. Thirdly, a smoothing parameter selector based on estimation stability (ES), ES-CV, is proposed for Lasso, in order to bring stability to bear on cross-validation (CV). ES-CV is then utilized in the encoding models to reduce the number of predictors by 60% with almost no loss (1.3%) of prediction performance across over 2,000 voxels. Last, a novel "stability" argument is seen to drive new results that shed light on the intriguing interactions between sample to sample variability and heavier tail error distribution (e.g., double-exponential) in high-dimensional regression models with

p

predictors and

n

independent samples. In particular, when

p/n\rightarrow\kappa\in(0.3,1)

and the error distribution is double-exponential, the Ordinary Least Squares (OLS) is a better estimator than the Least Absolute Deviation (LAD) estimator.Comment: Published in at http://dx.doi.org/10.3150/13-BEJSP14 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

eScholarship - University of California

Development of a land use regression model for black carbon using mobile monitoring data and its application to pollution-avoiding routing

Author: De Baets Bernard
Theunis Jan
Van den Bossche Joris
Van den Hove Annelies
Verwaeren Jan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Black carbon is often used as an indicator for combustion-related air pollution. In urban environments, on-road black carbon concentrations have a large spatial variability, suggesting that the personal exposure of a cyclist to black carbon can heavily depend on the route that is chosen to reach a destination. In this paper, we describe the development of a cyclist routing procedure that minimizes personal exposure to black carbon. Firstly, a land use regression model for predicting black carbon concentrations in an urban environment is developed using mobile monitoring data, collected by cyclists. The optimal model is selected and validated using a spatially stratified cross-validation scheme. The resulting model is integrated in a dedicated routing procedure that minimizes personal exposure to black carbon during cycling. The best model obtains a coefficient of multiple correlation of R = 0.520. Simulations with the black carbon exposure minimizing routing procedure indicate that the inhaled amount of black carbon is reduced by 1.58% on average as compared to the shortest-path route, with extreme cases where a reduction of up to 13.35% is obtained. Moreover, we observed that the average exposure to black carbon and the exposure to local peak concentrations on a route are competing objectives, and propose a parametrized cost function for the routing problem that allows for a gradual transition from routes that minimize average exposure to routes that minimize peak exposure