2,771 research outputs found
Classification using distance nearest neighbours
This paper proposes a new probabilistic classification algorithm using a
Markov random field approach. The joint distribution of class labels is
explicitly modelled using the distances between feature vectors. Intuitively, a
class label should depend more on class labels which are closer in the feature
space, than those which are further away. Our approach builds on previous work
by Holmes and Adams (2002, 2003) and Cucala et al. (2008). Our work shares many
of the advantages of these approaches in providing a probabilistic basis for
the statistical inference. In comparison to previous work, we present a more
efficient computational algorithm to overcome the intractability of the Markov
random field model. The results of our algorithm are encouraging in comparison
to the k-nearest neighbour algorithm.Comment: 12 pages, 2 figures. To appear in Statistics and Computin
Fast calibrated additive quantile regression
We propose a novel framework for fitting additive quantile regression models,
which provides well calibrated inference about the conditional quantiles and
fast automatic estimation of the smoothing parameters, for model structures as
diverse as those usable with distributional GAMs, while maintaining equivalent
numerical efficiency and stability. The proposed methods are at once
statistically rigorous and computationally efficient, because they are based on
the general belief updating framework of Bissiri et al. (2016) to loss based
inference, but compute by adapting the stable fitting methods of Wood et al.
(2016). We show how the pinball loss is statistically suboptimal relative to a
novel smooth generalisation, which also gives access to fast estimation
methods. Further, we provide a novel calibration method for efficiently
selecting the 'learning rate' balancing the loss with the smoothing priors
during inference, thereby obtaining reliable quantile uncertainty estimates.
Our work was motivated by a probabilistic electricity load forecasting
application, used here to demonstrate the proposed approach. The methods
described here are implemented by the qgam R package, available on the
Comprehensive R Archive Network (CRAN)
Geo-additive models of Childhood Undernutrition in three Sub-Saharan African Countries
We investigate the geographical and socioeconomic determinants of childhood undernutrition in Malawi, Tanzania and Zambia, three neighboring countries in Southern Africa using the 1992 Demographic and Health Surveys. We estimate models of undernutrition jointly for the three countries to explore regional patterns of undernutrition that transcend boundaries, while allowing for country-specific interactions. We use semiparametric models to flexibly model the effects of selected so-cioeconomic covariates and spatial effects. Our spatial analysis is based on a flexible geo-additive model using the district as the geographic unit of anal-ysis, which allows to separate smooth structured spatial effects from random effect. Inference is fully Bayesian and uses recent Markov chain Monte Carlo techniques. While the socioeconomic determinants generally confirm what is known in the literature, we find distinct residual spatial patterns that are not explained by the socioeconomic determinants. In particular, there appears to be a belt run-ning from Southern Tanzania to Northeastern Zambia which exhibits much worse undernutrition, even after controlling for socioeconomic effects. These effects do transcend borders between the countries, but to a varying degree. These findings have important implications for targeting policy as well as the search for left-out variables that might account for these residual spatial patterns
A comparison of block and semi-parametric bootstrap methods for variance estimation in spatial statistics
Efron (1979) introduced the bootstrap method for independent data but it cannot be easily applied to spatial data because of their dependency. For spatial data that are correlated in terms of their locations in the underlying space the moving block bootstrap method is usually used to estimate the precision measures of the estimators. The precision of the moving block bootstrap estimators is related to the block size which is difficult to select. In the moving block bootstrap method also the variance estimator is underestimated. In this paper, first the semi-parametric bootstrap is used to estimate the precision measures of estimators in spatial data analysis. In the semi-parametric bootstrap method, we use the estimation of the spatial correlation structure. Then, we compare the semi-parametric bootstrap with a moving block bootstrap for variance estimation of estimators in a simulation study. Finally, we use the semi-parametric bootstrap to analyze the coal-ash data
Twelve (not so) angry men: jurors work better in small groups. Lorraine Hope and Bridget Waller propose a simple modification to jury deliberations
Twelve-person juries are often regarded as one of the cornerstones of democracy. In the UK, the right to a trial by jury is considered an important feature of the criminal justice system. Indeed, it has been rated as more important than a number of other rights, including the right to protest against the government, the right not to be detained for an extended period without charge and the right to free speech in public (Roberts and Hough, 2009). The public also trusts juries comprising randomly selected ordinary people and relies on the contribution of 12 individuals to eliminate bias and prejudice from the decision making process
Classifying shape of internal pores within AlSi10Mg alloy manufactured by laser powder bed fusion using 3D X-ray micro computed tomography : influence of processing parameters and heat treatment
The authors gratefully acknowledge the support provided by the EPSRC (grant EP/R021694/1). The authors also wish to thank Rosie Bird at the University of Aberdeen for assisting with Avizo.Peer reviewedPostprin
Fast stable direct fitting and smoothness selection for Generalized Additive Models
Existing computationally efficient methods for penalized likelihood GAM
fitting employ iterative smoothness selection on working linear models (or
working mixed models). Such schemes fail to converge for a non-negligible
proportion of models, with failure being particularly frequent in the presence
of concurvity. If smoothness selection is performed by optimizing `whole model'
criteria these problems disappear, but until now attempts to do this have
employed finite difference based optimization schemes which are computationally
inefficient, and can suffer from false convergence. This paper develops the
first computationally efficient method for direct GAM smoothness selection. It
is highly stable, but by careful structuring achieves a computational
efficiency that leads, in simulations, to lower mean computation times than the
schemes based on working-model smoothness selection. The method also offers a
reliable way of fitting generalized additive mixed models
Cleaning sky survey databases using Hough Transform and Renewal String approaches
Large astronomical databases obtained from sky surveys such as the
SuperCOSMOS Sky Survey (SSS) invariably suffer from spurious records coming
from artefactual effects of the telescope, satellites and junk objects in orbit
around earth and physical defects on the photographic plate or CCD. Though
relatively small in number these spurious records present a significant problem
in many situations where they can become a large proportion of the records
potentially of interest to a given astronomer. Accurate and robust techniques
are needed for locating and flagging such spurious objects, and we are
undertaking a programme investigating the use of machine learning techniques in
this context. In this paper we focus on the four most common causes of unwanted
records in the SSS: satellite or aeroplane tracks, scratches, fibres and other
linear phenomena introduced to the plate, circular halos around bright stars
due to internal reflections within the telescope and diffraction spikes near to
bright stars. Appropriate techniques are developed for the detection of each of
these. The methods are applied to the SSS data to develop a dataset of spurious
object detections, along with confidence measures, which can allow these
unwanted data to be removed from consideration. These methods are general and
can be adapted to other astronomical survey data.Comment: Accepted for MNRAS. 17 pages, latex2e, uses mn2e.bst, mn2e.cls,
md706.bbl, shortbold.sty (all included). All figures included here as low
resolution jpegs. A version of this paper including the figures can be
downloaded from http://www.anc.ed.ac.uk/~amos/publications.html and more
details on this project can be found at
http://www.anc.ed.ac.uk/~amos/sattrackres.htm
- …