40 research outputs found
Sequential Quantiles via Hermite Series Density Estimation
Sequential quantile estimation refers to incorporating observations into
quantile estimates in an incremental fashion thus furnishing an online estimate
of one or more quantiles at any given point in time. Sequential quantile
estimation is also known as online quantile estimation. This area is relevant
to the analysis of data streams and to the one-pass analysis of massive data
sets. Applications include network traffic and latency analysis, real time
fraud detection and high frequency trading. We introduce new techniques for
online quantile estimation based on Hermite series estimators in the settings
of static quantile estimation and dynamic quantile estimation. In the static
quantile estimation setting we apply the existing Gauss-Hermite expansion in a
novel manner. In particular, we exploit the fact that Gauss-Hermite
coefficients can be updated in a sequential manner. To treat dynamic quantile
estimation we introduce a novel expansion with an exponentially weighted
estimator for the Gauss-Hermite coefficients which we term the Exponentially
Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing
sequential quantile estimation algorithms in that they allow arbitrary
quantiles (as opposed to pre-specified quantiles) to be estimated at any point
in time. In doing so we provide a solution to online distribution function and
online quantile function estimation on data streams. In particular we derive an
analytical expression for the CDF and prove consistency results for the CDF
under certain conditions. In addition we analyse the associated quantile
estimator. Simulation studies and tests on real data reveal the Gauss-Hermite
based algorithms to be competitive with a leading existing algorithm.Comment: 43 pages, 9 figures. Improved version incorporating referee comments,
as appears in Electronic Journal of Statistic
A statistical investigation into the properties and dynamics of biological populations experiencing environmental variability
Student Number : 9908888R -
MSc research report -
School of Statistics and Actuarial Science -
Faculty of ScienceMuch research has been devoted towards the understanding of population behaviour.
Such understanding has often been furthered through the development of theoretical
population models. This research report explores a variety of population models and
their implications.
The implications of the various models are explored using both analytical results and
simulations. Specific aspects of population behaviour studied include gross fluctuation
characteristics and extinction probabilities for a population.
This research report starts with an overview of Deterministic Models. This is followed
by a study of Birth and Death Processes, Branching Processes and Models that
incorporate environmental variability. Finally, we study the maximum likelihood
approach to population parameter estimation. The more notable theoretical results
derived include: the development of models that incorporate the population’s history;
models that incorporate discontinuous environmental changes and the development of
a means of parameter estimation for a Stochastic Differential Equation
Nonparametric Transient Classification using Adaptive Wavelets
Classifying transients based on multi band light curves is a challenging but
crucial problem in the era of GAIA and LSST since the sheer volume of
transients will make spectroscopic classification unfeasible. Here we present a
nonparametric classifier that uses the transient's light curve measurements to
predict its class given training data. It implements two novel components: the
first is the use of the BAGIDIS wavelet methodology - a characterization of
functional data using hierarchical wavelet coefficients. The second novelty is
the introduction of a ranked probability classifier on the wavelet coefficients
that handles both the heteroscedasticity of the data in addition to the
potential non-representativity of the training set. The ranked classifier is
simple and quick to implement while a major advantage of the BAGIDIS wavelets
is that they are translation invariant, hence they do not need the light curves
to be aligned to extract features. Further, BAGIDIS is nonparametric so it can
be used for blind searches for new objects. We demonstrate the effectiveness of
our ranked wavelet classifier against the well-tested Supernova Photometric
Classification Challenge dataset in which the challenge is to correctly
classify light curves as Type Ia or non-Ia supernovae. We train our ranked
probability classifier on the spectroscopically-confirmed subsample (which is
not representative) and show that it gives good results for all supernova with
observed light curve timespans greater than 100 days (roughly 55% of the
dataset). For such data, we obtain a Ia efficiency of 80.5% and a purity of
82.4% yielding a highly competitive score of 0.49 whilst implementing a truly
"model-blind" approach to supernova classification. Consequently this approach
may be particularly suitable for the classification of astronomical transients
in the era of large synoptic sky surveys.Comment: 14 pages, 8 figures. Published in MNRA
Towards the Future of Supernova Cosmology
For future surveys, spectroscopic follow-up for all supernovae will be
extremely difficult. However, one can use light curve fitters, to obtain the
probability that an object is a Type Ia. One may consider applying a
probability cut to the data, but we show that the resulting non-Ia
contamination can lead to biases in the estimation of cosmological parameters.
A different method, which allows the use of the full dataset and results in
unbiased cosmological parameter estimation, is Bayesian Estimation Applied to
Multiple Species (BEAMS). BEAMS is a Bayesian approach to the problem which
includes the uncertainty in the types in the evaluation of the posterior. Here
we outline the theory of BEAMS and demonstrate its effectiveness using both
simulated datasets and SDSS-II data. We also show that it is possible to use
BEAMS if the data are correlated, by introducing a numerical marginalisation
over the types of the objects. This is largely a pedagogical introduction to
BEAMS with references to the main BEAMS papers.Comment: Replaced under married name Lochner (formally Knights). 3 pages, 2
figures. To appear in the Proceedings of 13th Marcel Grossmann Meeting
(MG13), Stockholm, Sweden, 1-7 July 201
Extending BEAMS to incorporate correlated systematic uncertainties
New supernova surveys such as the Dark Energy Survey, Pan-STARRS and the LSST
will produce an unprecedented number of photometric supernova candidates, most
with no spectroscopic data. Avoiding biases in cosmological parameters due to
the resulting inevitable contamination from non-Ia supernovae can be achieved
with the BEAMS formalism, allowing for fully photometric supernova cosmology
studies. Here we extend BEAMS to deal with the case in which the supernovae are
correlated by systematic uncertainties. The analytical form of the full BEAMS
posterior requires evaluating 2^N terms, where N is the number of supernova
candidates. This `exponential catastrophe' is computationally unfeasible even
for N of order 100. We circumvent the exponential catastrophe by marginalising
numerically instead of analytically over the possible supernova types: we
augment the cosmological parameters with nuisance parameters describing the
covariance matrix and the types of all the supernovae, \tau_i, that we include
in our MCMC analysis. We show that this method deals well even with large,
unknown systematic uncertainties without a major increase in computational
time, whereas ignoring the correlations can lead to significant biases and
incorrect credible contours. We then compare the numerical marginalisation
technique with a perturbative expansion of the posterior based on the insight
that future surveys will have exquisite light curves and hence the probability
that a given candidate is a Type Ia will be close to unity or zero, for most
objects. Although this perturbative approach changes computation of the
posterior from a 2^N problem into an N^2 or N^3 one, we show that it leads to
biases in general through a small number of misclassifications, implying that
numerical marginalisation is superior.Comment: Resubmitted under married name Lochner (formally Knights). Version 3:
major changes, including a large scale analysis with thousands of MCMC
chains. Matches version published in JCAP. 23 pages, 8 figure
Photometric Supernova Cosmology with BEAMS and SDSS-II
Supernova cosmology without spectroscopic confirmation is an exciting new
frontier which we address here with the Bayesian Estimation Applied to Multiple
Species (BEAMS) algorithm and the full three years of data from the Sloan
Digital Sky Survey II Supernova Survey (SDSS-II SN). BEAMS is a Bayesian
framework for using data from multiple species in statistical inference when
one has the probability that each data point belongs to a given species,
corresponding in this context to different types of supernovae with their
probabilities derived from their multi-band lightcurves. We run the BEAMS
algorithm on both Gaussian and more realistic SNANA simulations with of order
10^4 supernovae, testing the algorithm against various pitfalls one might
expect in the new and somewhat uncharted territory of photometric supernova
cosmology. We compare the performance of BEAMS to that of both mock
spectroscopic surveys and photometric samples which have been cut using typical
selection criteria. The latter typically are either biased due to contamination
or have significantly larger contours in the cosmological parameters due to
small data-sets. We then apply BEAMS to the 792 SDSS-II photometric supernovae
with host spectroscopic redshifts. In this case, BEAMS reduces the area of the
(\Omega_m,\Omega_\Lambda) contours by a factor of three relative to the case
where only spectroscopically confirmed data are used (297 supernovae). In the
case of flatness, the constraints obtained on the matter density applying BEAMS
to the photometric SDSS-II data are \Omega_m(BEAMS)=0.194\pm0.07. This
illustrates the potential power of BEAMS for future large photometric supernova
surveys such as LSST.Comment: 25 pages, 15 figures, submitted to Ap
What does it mean to be affiliated with care?: Delphi consensus on the definition of unaffiliation and specialist in sickle cell disease
Accruing evidence reveals best practices for how to help individuals living with Sickle Cell Disease (SCD); yet, the implementation of these evidence-based practices in healthcare settings is lacking. The Sickle Cell Disease Implementation Consortium (SCDIC) is a national consortium that uses implementation science to identify and address barriers to care in SCD. The SCDIC seeks to understand how and why patients become unaffiliated from care and determine strategies to identify and connect patients to care. A challenge, however, is the lack of agreed-upon definition for what it means to be unaffiliated and what it means to be a SCD expert provider . In this study, we conducted a Delphi process to obtain expert consensus on what it means to be an unaffiliated patient with SCD and to define an SCD specialist, as no standard definition is available. Twenty-eight SCD experts participated in three rounds of questions. Consensus was defined as 80% or more of respondents agreeing. Experts reached consensus that an individual with SCD who is unaffiliated from care is someone who has not been seen by a sickle cell specialist in at least a year. A sickle cell specialist was defined as someone with knowledge and experience in SCD. Having knowledge means: being knowledgeable of the 2014 NIH Guidelines, Evidence-Based Management of SCD , trained in hydroxyurea management and transfusions, trained on screening for organ damage in SCD, trained in pain management and on SCD emergencies, and is aware of psychosocial and cognitive issues in SCD. Experiences that are expected of a SCD specialist include experience working with SCD patients, mentored by a SCD specialist, regular attendance at SCD conferences, and obtains continuing medical education on SCD every 2 years. The results have strong implications for future research, practice, and policy related to SCD by helping to lay a foundation for an new area of research (e.g., to identify subpopulations of unaffiliation and targeted interventions) and policies that support reaffiliation and increase accessibility to quality care
Results from the Supernova Photometric Classification Challenge
We report results from the Supernova Photometric Classification Challenge
(SNPCC), a publicly released mix of simulated supernovae (SNe), with types (Ia,
Ibc, and II) selected in proportion to their expected rate. The simulation was
realized in the griz filters of the Dark Energy Survey (DES) with realistic
observing conditions (sky noise, point-spread function and atmospheric
transparency) based on years of recorded conditions at the DES site.
Simulations of non-Ia type SNe are based on spectroscopically confirmed light
curves that include unpublished non-Ia samples donated from the Carnegie
Supernova Project (CSP), the Supernova Legacy Survey (SNLS), and the Sloan
Digital Sky Survey-II (SDSS-II). A spectroscopically confirmed subset was
provided for training. We challenged scientists to run their classification
algorithms and report a type and photo-z for each SN. Participants from 10
groups contributed 13 entries for the sample that included a host-galaxy
photo-z for each SN, and 9 entries for the sample that had no redshift
information. Several different classification strategies resulted in similar
performance, and for all entries the performance was significantly better for
the training subset than for the unconfirmed sample. For the spectroscopically
unconfirmed subset, the entry with the highest average figure of merit for
classifying SNe~Ia has an efficiency of 0.96 and an SN~Ia purity of 0.79. As a
public resource for the future development of photometric SN classification and
photo-z estimators, we have released updated simulations with improvements
based on our experience from the SNPCC, added samples corresponding to the
Large Synoptic Survey Telescope (LSST) and the SDSS, and provided the answer
keys so that developers can evaluate their own analysis.Comment: accepted by PAS