18 research outputs found

    Graphical Modeling for High Dimensional Data

    Get PDF
    With advances in science and information technologies, many scientific fields are able to meet the challenges of managing and analyzing high-dimensional data. A so-called large p small n problem arises when the number of experimental units, n, is equal to or smaller than the number of features, p. A methodology based on probability and graph theory, termed graphical models, is applied to study the structure and inference of such high-dimensional data

    Zero-Inflated Models for RNA-Seq Count Data

    Get PDF
    One of the main objectives of many biological studies is to explore differential gene expression profiles between samples. Genes are referred to as differentially expressed (DE) if the read counts change across treatments or conditions systematically. Poisson and negative binomial (NB) regressions are widely used methods for non-over-dispersed (NOD) and over-dispersed (OD) count data respectively. However, in the presence of excessive number of zeros, these methods need adjustments. In this paper, we consider a zero-inflated Poisson mixed effects model (ZIPMM) and zero-inflated negative binomial mixed effects model (ZINBMM) to address excessive zero counts in the NOD and OD RNA-seq data respectively in the presence of random effects. We apply these methods to both simulated and real RNA-seq datasets. The ZIPMM and ZINBMM perform better on both simulated and real datasets

    Nonparametric Bayesian Multiple Comparisons for Dependence Parameter in Bivariate Exponential Populations

    Get PDF
    A nonparametric Bayesian multiple comparisons problem (MCP) for dependence parameters in I bivariate exponential populations is studied. A simple method for pairwise comparisons of these parameters is also suggested. The methodology by Gopalan and Berry (1998) is extended using Dirichlet process priors, applied in the form of baseline prior and likelihood combination to provide the comparisons. Computation of the posterior probabilities of all possible hypotheses are carried out through a Markov Chain Monte Carlo, Gibbs sampling, due to the intractability of analytic evaluation. The process of MCP for the dependent parameters of bivariate exponential populations is illustrated with a numerical example

    Statistical Modeling of the Number of Deaths of Children in Bangladesh

    Get PDF
    Efforts to reduce the number of children’s death in developing countries through health care programs focus more to the prevention and control of diseases than to determining the underlying risk factors/predictors and addressing these through proper interventions. This study aims to identify socioeconomic and demographic predictors of the number of children’s death to women aged 12-49 from the Bangladesh Health and Demographic Survey (BDHS) administered in 2011. The number of children’s death in a family is a non-negative count response variable. The average number of children’s death is found to be 28 per 100 women with a variance of 44per 100 women. Thus Poisson regression model is not a proper choice to predict the mean response from the BDHS data due to the presence of over-dispersion. In order to address over-dispersion, we fit a Negative Binomial Regression (NBR), a Zero-Inflated Negative Binomial Regression (ZINBR) and a Hurdle Regression (HR) model. Among these models, ZINBR fits the data best. We identify respondent’s age, respondent’s age at 1st birth, gap between 1st birth and marriage, number of family members, region, religion, respondent’s education, husband’s education, incidence of twins, source of water, and wealth index as significant predictors for the number of children’s death in a family from the best fitted model. Identification of the risk factors of the number of children’s death is an important public health issue and should be carried out correctly for the much needed intervention

    Statistical Learning Methods to Predict Activity Intensity from Body-Worn Accelerometers

    Full text link
    Ă‚ Physical activity, especially when performed at moderate or vigorous intensity, has short- and long-term health benefits, but measurement of free-living physical activity is challenging. Accelerometers are popular tools to assess physical activity, although accuracy of conventional accelerometer analysis methods is suboptimal. This study developed and tested statistical learning models for assessing activity intensity from body-worn accelerometers. Twenty-eight adults performed 10-21 activities of daily living in two visits while wearing four accelerometers (right hip, right ankle, both wrists). Accelerometer placement is of crucial practical concern and this paper addresses this issue. Boosting, bagging, random forest and decision tree models were created for each accelerometer and for two-, three-, and four-accelerometer combinations to predict activity intensity. Research staff observations of activity intensity served as the criterion. Point estimates of error for the ankle accelerometer were 2.2-4.7 percentage points lower than other single-accelerometer placements, and the left wrist-ankle combination had errors 0.8-5.8 percentage points lower than other two-accelerometer combinations. Decision trees had poorer accuracy than the other models. Using an accelerometer worn on the lower limb, by itself or in combination with an upper-limb accelerometer, appears to offer optimal accuracy for activity intensity measurement
    corecore