523 research outputs found

    Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations

    Get PDF
    We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter (Formula presented.). This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45-54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169-186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37-57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on (Formula presented.). We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484-498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples. Ā© 2014 The Author(s)

    The solar simulation test of the ITALSAT thermal structural model

    Get PDF
    The ITALSAT structural/thermal model (STM) was submitted to a solar simulation test in order to verify the spacecraft thermal design and the thermal mathematical model which will be used to predict the on orbit temperatures. The STM was representative of the flight model in terms of configuration, structures, appendages and thermal hardware; dissipating dummy units were used to simulate the electronic units. The test consisted of the main phases: on station (beginning of life), on station (end of life), and transfer orbit. Preliminary results indicate that the test performances were satisfactory. The spacecraft measured temperatures were up to 15 degrees higher than the predicted ones. This imposes a careful correlation analysis in order to have reliable flight temperature predictions

    Premium: An R package for profile regression mixture models using dirichlet processes

    Get PDF
    PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection

    A space-time multivariate Bayesian model to analyse road traffic accidents by severity

    Get PDF
    The paper investigates the dependences between levels of severity of road traffic accidents, accounting at the same time for spatial and temporal correlations. The study analyses road traffic accidents data at ward level in England over the period 2005ā€“2013. We include in our model multivariate spatially structured and unstructured effects to capture the dependences between severities, within a Bayesian hierarchical formulation. We also include a temporal component to capture the time effects and we carry out an extensive model comparison. The results show important associations in both spatially structured and unstructured effects between severities, and a downward temporal trend is observed for low and high levels of severity. Maps of posterior accident rates indicate elevated risk within big cities for accidents of low severity and in suburban areas in the north and on the southern coast of England for accidents of high severity. The posterior probability of extreme rates is used to suggest the presence of hot spots in a public health perspective.Areti Boulieri acknowledges support from the National Institute for Health Research and the Medical Research Council Doctoral Training Partnership. Marta Blangiardo acknowledges support from the National Institute for Health Research and the Medical Research Councilā€“Public Health England Centre for Environment and Health. Silvia Liverani acknowledges support from the Leverhulme Trust (grant ECF-2011-576)

    Beyond Conjugacy for Chain Event Graph Model Selection

    Get PDF
    Chain event graphs are a family of probabilistic graphical models that generalise Bayesian networks and have been successfully applied to a wide range of domains. Unlike Bayesian networks, these models can encode context-specific conditional independencies as well as asymmetric developments within the evolution of a process. More recently, new model classes belonging to the chain event graph family have been developed for modelling time-to-event data to study the temporal dynamics of a process. However, existing Bayesian model selection algorithms for chain event graphs and its variants rely on all parameters having conjugate priors. This is unrealistic for many real-world applications. In this paper, we propose a mixture modelling approach to model selection in chain event graphs that does not rely on conjugacy. Moreover, we show that this methodology is more amenable to being robustly scaled than the existing model selection algorithms used for this family. We demonstrate our techniques on simulated datasets

    Bayesian modelling for spatially misaligned health areal data: a multiple membership approach

    Get PDF
    Diabetes prevalence is on the rise in the UK, and for public health strategy, estimation of relative disease risk and subsequent mapping is important. We consider an application to London data on diabetes prevalence and mortality. In order to improve the estimation of relative risks we analyse jointly prevalence and mortality data to ensure borrowing strength over the two outcomes. The available data involves two spatial frameworks, areas (middle level super output areas, MSOAs), and general practices (GPs) recruiting patients from several areas. This raises a spatial misalignment issue that we deal with by employing the multiple membership principle. Specifically we translate area spatial effects to explain GP practice prevalence according to proportions of GP populations resident in different areas. A sparse implementation in Stan of both the MCAR and GMCAR allows the comparison of these bivariate priors as well as exploring the different implications for the mapping patterns for both outcomes. The necessary causal precedence of diabetes prevalence over mortality allows a specific conditionality assumption in the GMCAR, not always present in the context of disease mapping

    Virtual mechanical product disassembly sequences based on disassembly order graphs and time measurement units

    Get PDF
    Recently, the approach that defines the total life cycle assessment (LCA) and the end of life (EoL) in the early design phases is becoming even more promising. Literature evidences many advantages in terms of the saving of costs and time and in the fluent organization of the whole design process. Design for disassembly (DfD) offers the possibility of reducing the time and cost of disassembling a product and accounts for the reusing of parts and of the dismantling of parts, joints, and materials. The sequence of disassembly is the ordered way to extract parts from an assembly and is a focal item in DfD because it can deeply influence times and operations. In this paper, some disassembly sequences are evaluated, and among them, two methods for defining an optimal sequence are provided and tested on a case study of a mechanical assembly. A further sequence of disassembly is provided by the authors based on experience and personal knowledge. All three are analyzed by the disassembly order graph (DOG) approach and compared. The operations evaluated have been converted in time using time measurement units (TMUs). As result, the best sequence has been highlighted in order to define a structured and efficient disassembly

    Who lives in overcrowded households in north-east London? Cross-sectional study of linked electronic health records and Energy Performance Certificate register data.

    Get PDF
    Objectives Household overcrowding is associated with adverse health outcomes, including increased risk of infectious diseases, mental health problems, and poor educational attainment. We investigated inequalities in overcrowding in an urban, ethnically diverse, and disadvantaged London population by pseudonymously linking electronic health records (EHR) to Energy Performance Certificates (EPC) data. Approach We used pseudonymised Unique Property Reference Numbers to link EHRs for 1,066,156 currently registered patients from 321,318 households in north-east London to EPC data. We measured household occupancy and derived the bedroom standard overcrowding definition (number of rooms relative to occupantsā€™ sex and ages) to estimate overcrowding prevalence. We examined associations with: household composition (adults only, single adult+children, ā‰„2 working-age adults+children, ā‰„1 retirement-age adults+children, three-generational household); ethnic background (White, South Asian, Black, Mixed, Other, missing); and Index of Multiple Deprivation (IMD) quintile. We used multivariable logistic regression to estimate the adjusted odds (aOR) and 95% Confidence Intervals (CI) of overcrowding. Results Overall, 243,793 (22.9%) people were overcrowded. People living in households with children, or three-generational households were more likely (aOR [95% CI] 3.79 [3.74 - 3.84]; 6.53 [6.41 - 6.66] respectively), and single adults or retirement age adults with children less likely (0.36 [0.35 - 0.38]; 0.36 [0.23 - 0.57] respectively), to be overcrowded. Overcrowding was more likely among people from Asian or Black ethnic backgrounds (1.24 [1.22 - 1.25] and 1.17 [1.15 - 1.19] respectively). There was a dose-response relationship between IMD quintile and overcrowding: OR 0.20 [0.20 - 0.21] in the least deprived compared to most deprived quintile. Conclusion One in five people in north-east London live in overcrowded households with marked inequalities by ethnicity, household generational composition, and deprivation. Up-to-date estimates of household overcrowding can be derived from linked housing and health records and used to evaluate the impact of economic policies on health and housing inequalities
    • ā€¦
    corecore