487 research outputs found
Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter (Formula presented.). This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45-54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169-186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37-57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on (Formula presented.). We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484-498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples. © 2014 The Author(s)
Premium: An R package for profile regression mixture models using dirichlet processes
PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection
Automatic Induction of Neural Network Decision Tree Algorithms
This work presents an approach to automatically induction for non-greedy
decision trees constructed from neural network architecture. This construction
can be used to transfer weights when growing or pruning a decision tree,
allowing non-greedy decision tree algorithms to automatically learn and adapt
to the ideal architecture. In this work, we examine the underpinning ideas
within ensemble modelling and Bayesian model averaging which allow our neural
network to asymptotically approach the ideal architecture through weights
transfer. Experimental results demonstrate that this approach improves models
over fixed set of hyperparameters for decision tree models and decision forest
models.Comment: This is a pre-print of a contribution "Chapman Siu, Automatic
Induction of Neural Network Decision Tree Algorithms." To appear in Computing
Conference 2019 Proceedings. Advances in Intelligent Systems and Computing.
Implementation:
https://github.com/chappers/automatic-induction-neural-decision-tre
On boosting kernel regression
In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L-2 boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast. and the variance diverges exponentially slow. The first boosting step is analysed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data
Vessel noise affects beaked whale behavior : results of a dedicated acoustic response study
© The Author(s), 2012. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS ONE 7 (2012): e42535, doi:10.1371/journal.pone.0042535.Some beaked whale species are susceptible to the detrimental effects of anthropogenic noise. Most studies have
concentrated on the effects of military sonar, but other forms of acoustic disturbance (e.g. shipping noise) may disrupt
behavior. An experiment involving the exposure of target whale groups to intense vessel-generated noise tested how these
exposures influenced the foraging behavior of Blainville’s beaked whales (Mesoplodon densirostris) in the Tongue of the
Ocean (Bahamas). A military array of bottom-mounted hydrophones was used to measure the response based upon
changes in the spatial and temporal pattern of vocalizations. The archived acoustic data were used to compute metrics of
the echolocation-based foraging behavior for 16 targeted groups, 10 groups further away on the range, and 26 nonexposed
groups. The duration of foraging bouts was not significantly affected by the exposure. Changes in the hydrophone
over which the group was most frequently detected occurred as the animals moved around within a foraging bout, and
their number was significantly less the closer the whales were to the sound source. Non-exposed groups also had
significantly more changes in the primary hydrophone than exposed groups irrespective of distance. Our results suggested
that broadband ship noise caused a significant change in beaked whale behavior up to at least 5.2 kilometers away from
the vessel. The observed change could potentially correspond to a restriction in the movement of groups, a period of more
directional travel, a reduction in the number of individuals clicking within the group, or a response to changes in prey
movement.The research reported here was financially supported by the United States (U.S.) Office of Naval Research (www.onr.navy.mil) grants N00014-07-10988,
N00014-07-11023, N00014-08-10990; the U.S. Strategic Environmental Research and Development Program (www.serdp.org) grant SI-1539, the Environmental
Readiness Division of the U.S. Navy (http://www.navy.mil/local/n45/), the U.S. Chief of Naval Operations Submarine Warfare Division (Undersea Surveillance), the
U.S. National Oceanic and Atmospheric Administration (National Marine Fisheries Service, Office of Science and Technology) (http://www.st.nmfs.noaa.gov/), U.S.
National Oceanic and Atmospheric Administration Ocean Acoustics Program (http://www.nmfs.noaa.gov/pr/acoustics/), and the Joint Industry Program on Sound
and Marine Life of the International Association of Oil and Gas Producers (www.soundandmarinelife.org)
Kernel density classification and boosting: an L2 sub analysis
Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research
A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer
A common characteristic of environmental epidemiology is the multi-dimensional aspect of exposure patterns, frequently reduced to a cumulative exposure for simplicity of analysis. By adopting a flexible Bayesian clustering approach, we explore the risk function linking exposure history to disease. This approach is applied here to study the relationship between different smoking characteristics and lung cancer in the framework of a population based case control study
Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens
- …