56 research outputs found

    Asymptotics and optimal bandwidth selection for highest density region estimation

    Get PDF
    We study kernel estimation of highest-density regions (HDR). Our main contributions are two-fold. First, we derive a uniform-in-bandwidth asymptotic approximation to a risk that is appropriate for HDR estimation. This approximation is then used to derive a bandwidth selection rule for HDR estimation possessing attractive asymptotic properties. We also present the results of numerical studies that illustrate the benefits of our theory and methodology.Comment: Published in at http://dx.doi.org/10.1214/09-AOS766 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Maximum likelihood estimation of a multivariate log-concave density

    Get PDF
    Density estimation is a fundamental statistical problem. Many methods are either sensitive to model misspecification (parametric models) or difficult to calibrate, especially for multivariate data (nonparametric smoothing methods). We propose an alternative approach using maximum likelihood under a qualitative assumption on the shape of the density, specifically log-concavity. The class of log-concave densities includes many common parametric families and has desirable properties. For univariate data, these estimators are relatively well understood, and are gaining in popularity in theory and practice. We discuss extensions for multivariate data, which require different techniques. After establishing existence and uniqueness of the log-concave maximum likelihood estimator for multivariate data, we see that a reformulation allows us to compute it using standard convex optimization techniques. Unlike kernel density estimation, or other nonparametric smoothing methods, this is a fully automatic procedure, and no additional tuning parameters are required. Since the assumption of log-concavity is non-trivial, we introduce a method for assessing the suitability of this shape constraint and apply it to several simulated datasets and one real dataset. Density estimation is often one stage in a more complicated statistical procedure. With this in mind, we show how the estimator may be used for plug-in estimation of statistical functionals. A second important extension is the use of log-concave components in mixture models. We illustrate how we may use an EM-style algorithm to fit mixture models where the number of components is known. Applications to visualization and classification are presented. In the latter case, improvement over a Gaussian mixture model is demonstrated. Performance for density estimation is evaluated in two ways. Firstly, we consider Hellinger convergence (the usual metric of theoretical convergence results for nonparametric maximum likelihood estimators). We prove consistency with respect to this metric and heuristically discuss rates of convergence and model misspecification, supported by empirical investigation. Secondly, we use the mean integrated squared error to demonstrate favourable performance compared with kernel density estimates using a variety of bandwidth selectors, including sophisticated adaptive methods. Throughout, we emphasise the development of stable numerical procedures able to handle the additional complexity of multivariate data

    Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility

    Get PDF
    Given a set of incomplete observations, we study the nonparametric problem of testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely the set of alternatives that can be distinguished from the MCAR null hypothesis. This reveals interesting and novel links to the theory of Fréchet classes (in particular, compatible distributions) and linear programming, that allow us to propose MCAR tests that are consistent against all detectable alternatives. We define an incompatibility index as a natural measure of ease of detectability, establish its key properties, and show how it can be computed exactly in some cases and bounded in others. Moreover, we prove that our tests can attain the minimax separation rate according to this measure, up to logarithmic factors. Our methodology does not require any complete cases to be effective, and is available in the R package MCARtest

    The Search for Supernova-produced Radionuclides in Terrestrial Deep-sea Archives

    Full text link
    An enhanced concentration of 60Fe was found in a deep ocean's crust in 2004 in a layer corresponding to an age of ~2 Myr. The confirmation of this signal in terrestrial archives as supernova-induced and detection of other supernova-produced radionuclides is of great interest. We have identified two suitable marine sediment cores from the South Australian Basin and estimated the intensity of a possible signal of the supernova-produced radionuclides 26Al, 53Mn, 60Fe and the pure r-process element 244Pu in these cores. A finding of these radionuclides in a sediment core might allow to improve the time resolution of the signal and thus to link the signal to a supernova event in the solar vicinity ~2 Myr ago. Furthermore, it gives an insight on nucleosynthesis scenarios in massive stars, the condensation into dust grains and transport mechanisms from the supernova shell into the solar system

    Ensemble of a subset of kNN classifiers

    Get PDF
    Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines

    Screening of healthcare workers for SARS-CoV-2 highlights the role of asymptomatic carriage in COVID-19 transmission.

    Get PDF
    Significant differences exist in the availability of healthcare worker (HCW) SARS-CoV-2 testing between countries, and existing programmes focus on screening symptomatic rather than asymptomatic staff. Over a 3 week period (April 2020), 1032 asymptomatic HCWs were screened for SARS-CoV-2 in a large UK teaching hospital. Symptomatic staff and symptomatic household contacts were additionally tested. Real-time RT-PCR was used to detect viral RNA from a throat+nose self-swab. 3% of HCWs in the asymptomatic screening group tested positive for SARS-CoV-2. 17/30 (57%) were truly asymptomatic/pauci-symptomatic. 12/30 (40%) had experienced symptoms compatible with coronavirus disease 2019 (COVID-19)>7 days prior to testing, most self-isolating, returning well. Clusters of HCW infection were discovered on two independent wards. Viral genome sequencing showed that the majority of HCWs had the dominant lineage B∙1. Our data demonstrates the utility of comprehensive screening of HCWs with minimal or no symptoms. This approach will be critical for protecting patients and hospital staff.This work was supported by the Wellcome Trust Senior Research Fellowships 108070/Z/15/Z to MPW, 215515/Z/19/Z to SGB and 207498/Z/17/Z to IGG; Collaborative award 206298/B/17/Z to IGG; Principal Research Fellowship 210688/Z/18/Z to PJL; Investigator Award 200871/Z/16/Z to KGCS; Addenbrooke’s Charitable Trust (to MPW, SGB, IGG and PJL); the Medical Research Council (CSF MR/P008801/1 to NJM); NHS Blood and Transfusion (WPA15-02 to NJM); National Institute for Health Research (Cambridge Biomedical Research Centre at CUHNFT), to JRB, MET, AC and GD, Academy of Medical Sciences and the Health Foundation (Clinician Scientist Fellowship to MET), Engineering and Physical Sciences Research Council (EP/P031447/1 and EP/N031938/1 to RS),Cancer Research UK (PRECISION Grand Challenge C38317/A24043 award to JY). Components of this work were supported by the COVID-19 Genomics UK Consortium, (COG-UK), which is supported by funding from the Medical Research Council (MRC) part of UK Research & Innovation (UKRI), the National Institute of Health Research (NIHR) and Genome Research Limited, operating as the Wellcome Sanger Institut

    Age and date for early arrival of the Acheulian in Europe (Barranc de la Boella, la Canonja, Spain)

    Get PDF
    The first arrivals of hominin populations into Eurasia during the Early Pleistocene are currently considered to have occurred as short and poorly dated biological dispersions. Questions as to the tempo and mode of these early prehistoric settlements have given rise to debates concerning the taxonomic significance of the lithic assemblages, as trace fossils, and the geographical distribution of the technological traditions found in the Lower Palaeolithic record. Here, we report on the Barranc de la Boella site which has yielded a lithic assemblage dating to ,1 million years ago that includes large cutting tools (LCT). We argue that distinct technological traditions coexisted in the Iberian archaeological repertoires of the late Early Pleistocene age in a similar way to the earliest sub-Saharan African artefact assemblages. These differences between stone tool assemblages may be attributed to the different chronologies of hominin dispersal events. The archaeological record of Barranc de la Boella completes the geographical distribution of LCT assemblages across southern Eurasia during the EMPT (Early-Middle Pleistocene Transition, circa 942 to 641 kyr). Up to now, chronology of the earliest European LCT assemblages is based on the abundant Palaeolithic record found in terrace river sequences which have been dated to the end of the EMPT and later. However, the findings at Barranc de la Boella suggest that early LCT lithic assemblages appeared in the SW of Europe during earlier hominin dispersal episodes before the definitive colonization of temperate Eurasia took place.The research at Barranc de la Boella has been carried out with the financial support of the Spanish Ministerio de Economı´a y Competitividad (CGL2012- 36682; CGL2012-38358, CGL2012-38434-C03-03 and CGL2010-15326; MICINN project HAR2009-7223/HIST), Generalitat de Catalunya, AGAUR agence (projects 2014SGR-901; 2014SGR-899; 2009SGR-324, 2009PBR-0033 and 2009SGR-188) and Junta de Castilla y Leo´n BU1004A09. Financial support for Barranc de la Boella field work and archaeological excavations is provided by the Ajuntament de la Canonja and Departament de Cultura (Servei d’Arqueologia i Paleontologia) de la Generalitat de Catalunya. A. Carrancho’s research was funded by the International Excellence Programme, Reinforcement subprogramme of the Spanish Ministry of Education. I. Lozano-Ferna´ndez acknowledges the pre-doctoral grant from the Fundacio´n Atapuerca. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Screening of healthcare workers for SARS-CoV-2 highlights the role of asymptomatic carriage in COVID-19 transmission

    Get PDF
    Significant differences exist in the availability of healthcare worker (HCW) SARS-CoV-2 testing between countries, and existing programmes focus on screening symptomatic rather than asymptomatic staff. Over a 3-week period (April 2020), 1,032 asymptomatic HCWs were screened for SARS-CoV-2 in a large UK teaching hospital. Symptomatic staff and symptomatic household contacts were additionally tested. Real-time RT-PCR was used to detect viral RNA from a throat+nose self-swab. 3% of HCWs in the asymptomatic screening group tested positive for SARS-CoV-2. 17/30 (57%) were truly asymptomatic/pauci-symptomatic. 12/30 (40%) had experienced symptoms compatible with coronavirus disease 2019 (COVID-19) >7 days prior to testing, most self-isolating, returning well. Clusters of HCW infection were discovered on two independent wards. Viral genome sequencing showed that the majority of HCWs had the dominant lineage B·1. Our data demonstrates the utility of comprehensive screening of HCWs with minimal or no symptoms. This approach will be critical for protecting patients and hospital staff

    Simultaneous confidence regions for multivariate bioequivalence

    Get PDF
    Demonstrating bioequivalence of several pharmacokinetic (PK) parameters, such as AUC and Cmax, that are calculated from the same biological sample measurements is in fact a multivariate problem, even though this is neglected by most practitioners and regulatory bodies, who typically settle for separate univariate analyses. We believe, however, that a truly multivariate evaluation of all PK measures simultaneously is clearly more adequate. In this paper, we review methods to construct joint confidence regions around multivariate normal means and investigate their usefulness in simultaneous bioequivalence problems via simulation. Some of them work well for idealised scenarios but break down when faced with real-data challenges such as unknown variance and correlation among the PK parameters. We study the shapes of the confidence regions resulting from different methods, discuss how marginal simultaneous confidence intervals for the individual PK measures can be derived, and illustrate the application to data from a trial on ticlopidine hydrochloride. An R package is available
    corecore