18 research outputs found

    Bayesian inference on high-dimensional multivariate binary responses

    No full text
    It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form. Although a variety of algorithms have been proposed to approximate this intractable integral, these approaches are difficult to implement and/or inaccurate in high dimensions. Our main focus is in accommodating high-dimensional binary response data with a small-to-moderate number of covariates. We propose a two-stage approach for inference on model parameters while taking care of uncertainty propagation between the stages. We use the special structure of latent Gaussian models to reduce the highly expensive computation involved in joint parameter estimation to focus inference on marginal distributions of model parameters. This essentially makes the method embarrassingly parallel for both stages. We illustrate performance in simulations and applications to joint species distribution modeling in ecology.</p

    Modeling Recurrent Failures on Large Directed Networks

    No full text
    Many lifeline infrastructure systems consist of thousands of components configured in a complex directed network. Disruption of the infrastructure constitutes a recurrent failure process over a directed network. Statistical inference for such network recurrence data is challenging because of the large number of nodes with irregular connections among them. Motivated by 16 years of Scottish Water operation records, we propose a network Gamma-Poisson Autoregressive NHPP (GPAN) model for recurrent failure data from large-scale directed physical networks. The model consists of two layers: the temporal layer applies a Non-Homogeneous Poisson Process (NHPP) with node-specific frailties, and the spatial layer uses a well-orchestrated gamma-Poisson autoregressive scheme to establish correlations among the frailties. Under the network-GPAN model, we develop a sum-product algorithm to compute the marginal distribution for each frailty conditional on the recurrence data. The marginal conditional frailty distributions are useful for predicting future failures based on historical data. In addition, the ability to rapidly compute these marginal distributions allows adoption of an EM type algorithm for estimation. Through a Bethe approximation, the output from the sum-product algorithm is used to compute maximum log-likelihood estimates. Applying the methods to the Scottish Water network, we demonstrate utility in aiding operation management and risk assessment of the water utility. Supplementary materials for this article are available online including a standardized description of the materials available for reproducing the work.</p

    AUCS from ROC curves from the logistic regression models.

    No full text
    The outcome of interest in this model was death within 30 days of the surgical procedure.</p

    The dependence between the risk factor variables captured via correlation using <i>R</i><sup>2</sup> values are shown in the left image.

    No full text
    The right image shows the dependence using mutual information measured in bits. The mutual information estimates were obtained using k-nearest neighbors type estimators (see the appendix) with k = 20.</p

    Estimated densities for the five risk factors in the gapminder data are shown along the diagonal.

    No full text
    Below the diagonal, pairwise scatterplots of the data including linear regression (black lines) with 95% confidence intervals (shaded grey regions) are provided. Above the diagonal, the R2 values, indicating the percent of variation accounted for by the linear relationships between the data, are provided. Source: multiple sources for the individual variables aggregated by www.gapminder.org.</p

    Two examples of patient HR and MAP data are shown above.

    No full text
    (A) The patient’s HR remains roughly constant throughout the procedure. The variability in MAP does not have a clear association with HR which is consistent with a small I(HR, MAP) = 2.1 bits. (B) In this case, HR and MAP appear much more tightly coupled and exhibit similar dynamics throughout the procedure which is reflected in a large I(HR, MAP) = 4 bits. The correlation coefficient of HR and MAP, shown in the figures, were small and nearly equal which may wrongly suggest that HR and MAP were similarly dependent in these two cases.</p

    The mean and 95% confidence intervals of the hazard ratio associated with each of the linear indicators in the CPH.

    No full text
    The outcome of interest in this study was the time to death were right censored for death occurring more than 365 from the surgical procedure.</p

    Summaries of categorical covariates pertaining to ASA status, emergent or nonemergent operation, and risk of surgical procedure.

    No full text
    ASA Status is a numerical score from one to six of overall patient health from one (healthy patients) to five (critically ill patients). Cases with an ASA Code of five were excluded as there were too few such cases for reliable analysis.</p
    corecore