Search CORE

487 research outputs found

Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations

Author: Hastie DI
Liverani S
Richardson S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter (Formula presented.). This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45-54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169-186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37-57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on (Formula presented.). We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484-498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples. © 2014 The Author(s)

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

PubMed Central

Queen Mary Research Online

Brunel University Research Archive

Premium: An R package for profile regression mixture models using dirichlet processes

Author: Azizi L
Hastie DI
Liverani S
Papathomas M
Richardson S
Publication venue: 'Informa UK Limited'
Publication date: 25/04/2014
Field of study

PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, nonparametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Journal of Statistical Software

Brunel University Research Archive

University of St. Andrews - Pure

St Andrews Research Repository

Automatic Induction of Neural Network Decision Tree Algorithms

Author: CM Bishop
DH Wolpert
DI Hastie
SK Murthy
T Le
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/04/2019
Field of study

This work presents an approach to automatically induction for non-greedy decision trees constructed from neural network architecture. This construction can be used to transfer weights when growing or pruning a decision tree, allowing non-greedy decision tree algorithms to automatically learn and adapt to the ideal architecture. In this work, we examine the underpinning ideas within ensemble modelling and Bayesian model averaging which allow our neural network to asymptotically approach the ideal architecture through weights transfer. Experimental results demonstrate that this approach improves models over fixed set of hyperparameters for decision tree models and decision forest models.Comment: This is a pre-print of a contribution "Chapman Siu, Automatic Induction of Neural Network Decision Tree Algorithms." To appear in Computing Conference 2019 Proceedings. Advances in Intelligent Systems and Computing. Implementation: https://github.com/chappers/automatic-induction-neural-decision-tre

arXiv.org e-Print Archive

Crossref

On boosting kernel regression

Author: Breiman
Bühlmann
Bühlmann
Charles C. Taylor
Chaudhuri
Di Marzio
Doksum
Fan
Freund
Friedman
Friedman
Harrison
Hastie
Härdle
Jiang
Jones
Lax
Lugosi
Marco Di Marzio
Müller
Rice
Schapire
Stuetzle
Tukey
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the Nadaraya-Watson estimator with L-2 boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast. and the variance diverges exponentially slow. The first boosting step is analysed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data

CiteSeerX

Crossref

White Rose Research Online

Vessel noise affects beaked whale behavior : results of a dedicated acoustic response study

Author: Andreas Fahlman
David Moretti
Enrico Pirotta
Gordon Hastie
Ian Boyd
Nancy Di Marzio
Nicola Quick
Peter Tyack
Rachael Milor
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/08/2012
Field of study

© The Author(s), 2012. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS ONE 7 (2012): e42535, doi:10.1371/journal.pone.0042535.Some beaked whale species are susceptible to the detrimental effects of anthropogenic noise. Most studies have concentrated on the effects of military sonar, but other forms of acoustic disturbance (e.g. shipping noise) may disrupt behavior. An experiment involving the exposure of target whale groups to intense vessel-generated noise tested how these exposures influenced the foraging behavior of Blainville’s beaked whales (Mesoplodon densirostris) in the Tongue of the Ocean (Bahamas). A military array of bottom-mounted hydrophones was used to measure the response based upon changes in the spatial and temporal pattern of vocalizations. The archived acoustic data were used to compute metrics of the echolocation-based foraging behavior for 16 targeted groups, 10 groups further away on the range, and 26 nonexposed groups. The duration of foraging bouts was not significantly affected by the exposure. Changes in the hydrophone over which the group was most frequently detected occurred as the animals moved around within a foraging bout, and their number was significantly less the closer the whales were to the sound source. Non-exposed groups also had significantly more changes in the primary hydrophone than exposed groups irrespective of distance. Our results suggested that broadband ship noise caused a significant change in beaked whale behavior up to at least 5.2 kilometers away from the vessel. The observed change could potentially correspond to a restriction in the movement of groups, a period of more directional travel, a reduction in the number of individuals clicking within the group, or a response to changes in prey movement.The research reported here was financially supported by the United States (U.S.) Office of Naval Research (www.onr.navy.mil) grants N00014-07-10988, N00014-07-11023, N00014-08-10990; the U.S. Strategic Environmental Research and Development Program (www.serdp.org) grant SI-1539, the Environmental Readiness Division of the U.S. Navy (http://www.navy.mil/local/n45/), the U.S. Chief of Naval Operations Submarine Warfare Division (Undersea Surveillance), the U.S. National Oceanic and Atmospheric Administration (National Marine Fisheries Service, Office of Science and Technology) (http://www.st.nmfs.noaa.gov/), U.S. National Oceanic and Atmospheric Administration Ocean Acoustics Program (http://www.nmfs.noaa.gov/pr/acoustics/), and the Joint Industry Program on Sound and Marine Life of the International Association of Oil and Gas Producers (www.soundandmarinelife.org)

Public Library of Science (PLOS)

Crossref

Woods Hole Open Access Server

Directory of Open Access Journals

PubMed Central

University of St. Andrews - Pure

St Andrews Research Repository

Kernel density classification and boosting: an L2 sub analysis

Author: B.W. Silverman
C. C. Taylor
D. Michie
D.E. Wright
D.J. Hand
G. Ridgeway
G.R. Terrell
I.S. Abramson
J.D.F. Habbema
J.H. Friedman
J.H. Friedman
J.H. Friedman
M. Di Marzio
M. Di Marzio
M.C. Jones
M.C. Jones
M.C. Jones
M.P. Wand
P. Bühlmann
P. Hall
P. Hall
P. Hall
R.E. Shapire
T. Hastie
Y. Freund
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research

CiteSeerX

Crossref

White Rose Research Online

A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer

Author: A Lacourt
AE Gelfand
B Pesch
C Tarnaud
CE Antoniak
D Consonni
D Dahl
D Luce
David I Hastie
DI Ohlssen
H Ishwaran
H Ishwaran
H Ishwaran
H Zhang
Isabelle Stücker
J Molitor
J Peto
JH Lubin
JH Lubin
JS Liu
L Breiman
L Kaufman
Lamiae Azizi
M Abrahamowicz
M Kalli
M Papathomas
M Papathomas
MD Ritchie
P Papaspiliopoulos
PJ Green
R Goel
R Peto
RF MacLehose
SC Lemon
SG Walker
Silvia Liverani
SW Thurston
Sylvia Richardson
W Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2013
Field of study

A common characteristic of environmental epidemiology is the multi-dimensional aspect of exposure patterns, frequently reduced to a cumulative exposure for simplicity of analysis. By adopting a flexible Bayesian clustering approach, we explore the risk function linking exposure history to disease. This approach is applied here to study the relationship between different smoking characteristics and lung cancer in the framework of a population based case control study

Crossref

Springer - Publisher Connector

HAL-Inserm

PubMed Central

Queen Mary Research Online

Brunel University Research Archive

HAL UVSQ