487 research outputs found

    Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations

    Get PDF
    We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter (Formula presented.). This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45-54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169-186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37-57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on (Formula presented.). We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484-498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples. © 2014 The Author(s)

    Premium: An R package for profile regression mixture models using dirichlet processes

    Get PDF
    PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection

    Automatic Induction of Neural Network Decision Tree Algorithms

    Full text link
    This work presents an approach to automatically induction for non-greedy decision trees constructed from neural network architecture. This construction can be used to transfer weights when growing or pruning a decision tree, allowing non-greedy decision tree algorithms to automatically learn and adapt to the ideal architecture. In this work, we examine the underpinning ideas within ensemble modelling and Bayesian model averaging which allow our neural network to asymptotically approach the ideal architecture through weights transfer. Experimental results demonstrate that this approach improves models over fixed set of hyperparameters for decision tree models and decision forest models.Comment: This is a pre-print of a contribution "Chapman Siu, Automatic Induction of Neural Network Decision Tree Algorithms." To appear in Computing Conference 2019 Proceedings. Advances in Intelligent Systems and Computing. Implementation: https://github.com/chappers/automatic-induction-neural-decision-tre

    On boosting kernel regression

    No full text
    In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L-2 boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast. and the variance diverges exponentially slow. The first boosting step is analysed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data

    Vessel noise affects beaked whale behavior : results of a dedicated acoustic response study

    Get PDF
    © The Author(s), 2012. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS ONE 7 (2012): e42535, doi:10.1371/journal.pone.0042535.Some beaked whale species are susceptible to the detrimental effects of anthropogenic noise. Most studies have concentrated on the effects of military sonar, but other forms of acoustic disturbance (e.g. shipping noise) may disrupt behavior. An experiment involving the exposure of target whale groups to intense vessel-generated noise tested how these exposures influenced the foraging behavior of Blainville’s beaked whales (Mesoplodon densirostris) in the Tongue of the Ocean (Bahamas). A military array of bottom-mounted hydrophones was used to measure the response based upon changes in the spatial and temporal pattern of vocalizations. The archived acoustic data were used to compute metrics of the echolocation-based foraging behavior for 16 targeted groups, 10 groups further away on the range, and 26 nonexposed groups. The duration of foraging bouts was not significantly affected by the exposure. Changes in the hydrophone over which the group was most frequently detected occurred as the animals moved around within a foraging bout, and their number was significantly less the closer the whales were to the sound source. Non-exposed groups also had significantly more changes in the primary hydrophone than exposed groups irrespective of distance. Our results suggested that broadband ship noise caused a significant change in beaked whale behavior up to at least 5.2 kilometers away from the vessel. The observed change could potentially correspond to a restriction in the movement of groups, a period of more directional travel, a reduction in the number of individuals clicking within the group, or a response to changes in prey movement.The research reported here was financially supported by the United States (U.S.) Office of Naval Research (www.onr.navy.mil) grants N00014-07-10988, N00014-07-11023, N00014-08-10990; the U.S. Strategic Environmental Research and Development Program (www.serdp.org) grant SI-1539, the Environmental Readiness Division of the U.S. Navy (http://www.navy.mil/local/n45/), the U.S. Chief of Naval Operations Submarine Warfare Division (Undersea Surveillance), the U.S. National Oceanic and Atmospheric Administration (National Marine Fisheries Service, Office of Science and Technology) (http://www.st.nmfs.noaa.gov/), U.S. National Oceanic and Atmospheric Administration Ocean Acoustics Program (http://www.nmfs.noaa.gov/pr/acoustics/), and the Joint Industry Program on Sound and Marine Life of the International Association of Oil and Gas Producers (www.soundandmarinelife.org)

    Kernel density classification and boosting: an L2 sub analysis

    Get PDF
    Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research

    A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer

    Get PDF
    A common characteristic of environmental epidemiology is the multi-dimensional aspect of exposure patterns, frequently reduced to a cumulative exposure for simplicity of analysis. By adopting a flexible Bayesian clustering approach, we explore the risk function linking exposure history to disease. This approach is applied here to study the relationship between different smoking characteristics and lung cancer in the framework of a population based case control study
    corecore