Search CORE

19,142 research outputs found

Prediction of survival probabilities with Bayesian Decision Trees

Author: Bailey
Becalick
Bishop
Bouamra
Boyd
Breiman
Chawda
Chipman
Chipman
Clermont
Denison
Dietterich
DiRusso
Domingos
Duda
Green
Hadfield
Hall
Hilden
Hunter
Jaimes
Jakaite
Kilgo
Kilgo
Koshy
Kreke
Krzanowski
Kuncheva
Li
Livia Jakaite
Lunn
Millham
Oakland
Osler
Osler
Patil
Quinlan
Robert
Rogers
Rogers
Schetinin
Schetinin
Schetinin
Schetinin
Silva
Steyerberg
Stojadinovic
Sujin
Vapnik
Vitaly Schetinin
Wojtek J. Krzanowski
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Practitioners use Trauma and Injury Severity Score (TRISS) models for predicting the survival probability of an injured patient. The accuracy of TRISS predictions is acceptable for patients with up to three typical injuries, but unacceptable for patients with a larger number of injuries or with atypical injuries. Based on a regression model, the TRISS methodology does not provide the predictive density required for accurate assessment of risk. Moreover, the regression model is difficult to interpret. We therefore consider Bayesian inference for estimating the predictive distribution of survival. The inference is based on decision tree models which recursively split data along explanatory variables, and so practitioners can understand these models. We propose the Bayesian method for estimating the predictive density and show that it outperforms the TRISS method in terms of both goodness-of-fit and classification accuracy. The developed method has been made available for evaluation purposes as a stand-alone application

Crossref

University of Bedfordshire Repository

CVXR: An R Package for Disciplined Convex Optimization

Author: Boyd Stephen
Fu Anqi
Narasimhan Balasubramanian
Publication venue
Publication date: 29/06/2020
Field of study

CVXR is an R package that provides an object-oriented modeling language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the restrictive form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters using a library of functions with known mathematical properties. CVXR then applies signed disciplined convex programming (DCP) to verify the problem's convexity. Once verified, the problem is converted into standard conic form using graph implementations and passed to a cone solver such as ECOS or SCS. We demonstrate CVXR's modeling framework with several applications.Comment: 34 pages, 9 figure

arXiv.org e-Print Archive

Journal of Statistical Software

Efficient posterior sampling for high-dimensional imbalanced logistic regression

Author: Dunson David
Lu Jianfeng
Sachs Matthias
Sen Deborshee
Publication venue
Publication date: 14/11/2019
Field of study

High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as

n

and/or

p

increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.Comment: 4 figure

arXiv.org e-Print Archive

University of Birmingham Research Portal

PubMed Central

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Author: Braverman Mark
Garg Ankit
Ma Tengyu
Nguyen Huy L.
Woodruff David P.
Publication venue
Publication date: 09/05/2016
Field of study

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the

m

machines receives

n

data points from a

d

-dimensional Gaussian distribution with unknown mean

\theta

which is promised to be

k

-sparse. The machines communicate by message passing and aim to estimate the mean

\theta

. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed \textit{sparse linear regression} problem: to achieve the statistical minimax error, the total communication is at least

\Omega(\min\{n,d\}m)

, where

n

is the number of observations that each machine receives and

d

is the ambient dimension. These lower results improve upon [Sha14,SD'14] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a \textit{distributed data processing inequality}, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.Comment: To appear at STOC 2016. Fixed typos in theorem 4.5 and incorporated reviewers' suggestion

arXiv.org e-Print Archive

Princeton University Open Access Repository

Implicit Langevin Algorithms for Sampling From Log-concave Densities

Author: Hodgkinson Liam
Roosta Fred
Salomone Robert
Publication venue
Publication date: 01/07/2021
Field of study

For sampling from a log-concave density, we study implicit integrators resulting from

\theta

-method discretization of the overdamped Langevin diffusion stochastic differential equation. Theoretical and algorithmic properties of the resulting sampling methods for

\theta \in [0,1]

and a range of step sizes are established. Our results generalize and extend prior works in several directions. In particular, for

\theta\ge1/2

, we prove geometric ergodicity and stability of the resulting methods for all step sizes. We show that obtaining subsequent samples amounts to solving a strongly-convex optimization problem, which is readily achievable using one of numerous existing methods. Numerical examples supporting our theoretical analysis are also presented

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive