406 research outputs found

    Nonparametrically consistent depth-based classifiers

    Full text link
    We introduce a class of depth-based classification procedures that are of a nearest-neighbor nature. Depth, after symmetrization, indeed provides the center-outward ordering that is necessary and sufficient to define nearest neighbors. Like all their depth-based competitors, the resulting classifiers are affine-invariant, hence in particular are insensitive to unit changes. Unlike the former, however, the latter achieve Bayes consistency under virtually any absolutely continuous distributions - a concept we call nonparametric consistency, to stress the difference with the stronger universal consistency of the standard kkNN classifiers. We investigate the finite-sample performances of the proposed classifiers through simulations and show that they outperform affine-invariant nearest-neighbor classifiers obtained through an obvious standardization construction. We illustrate the practical value of our classifiers on two real data examples. Finally, we shortly discuss the possible uses of our depth-based neighbors in other inference problems.Comment: Published at http://dx.doi.org/10.3150/13-BEJ561 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Depth and Depth-Based Classification with R Package ddalpha

    Get PDF
    Following the seminal idea of Tukey (1975), data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification. ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the DDα-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition

    Effective Visualizations of the Uncertainty in Hurricane Forecasts

    Get PDF
    The track forecast cone developed by the U.S. National Hurricane Center is the one most universally adopted by the general public, the news media, and governmental officials to enhance viewers\u27 understanding of the forecasts and their underlying uncertainties. However, current research has experimentally shown that it has limitations that result in misconceptions of the uncertainty included. Most importantly, the area covered by the cone tends to be misinterpreted as the region affected by the hurricane. In addition, the cone summarizes forecasts for the next three days into a single representation and, thus, makes it difficult for viewers to accurately determine crucial time-specific information. To address these limitations, this research develops novel alternative visualizations. It begins by developing a technique that generates and smoothly interpolates robust statistics from ensembles of hurricane predictions, thus creating visualizations that inherently include the spatial uncertainty by displaying three levels of positional storm strike risk at a specific point in time. To address the misconception of the area covered by the cone, this research develops time-specific visualizations depicting spatial information based on a sampling technique that selects a small, representative subset from an ensemble of points. It also allows depictions of such important storm characteristics as size and intensity. Further, this research generalizes the representative sampling framework to process ensembles of forecast tracks, selecting a subset of tracks accurately preserving the original distributions of available storm characteristics and keeping appropriately defined spatial separations. This framework supports an additional hurricane visualization portraying prediction uncertainties implicitly by directly showing the members of the subset without the visual clutter. We collaborated on cognitive studies that suggest that these visualizations enhance viewers\u27 ability to understand the forecasts because they are potentially interpreted more like uncertainty distributions. In addition to benefiting the field of hurricane forecasting, this research potentially enhances the visualization community more generally. For instance, the representative sampling framework for processing 2D points developed here can be applied to enhancing the standard scatter plots and density plots by reducing sizes of data sets. Further, as the idea of direct ensemble displays can possibly be extended to more general numerical simulations, it, thus, has potential impacts on a wide range of ensemble visualizations

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Statistical Process Monitoring of Isolated and Persistent Defects in Complex Geometrical Shapes

    Full text link
    Traditional Statistical Process Control methodologies face several challenges when monitoring defects in complex geometries, such as those of products obtained via Additive Manufacturing techniques. Many approaches cannot be applied in these settings due to the high dimensionality of the data and the lack of parametric and distributional assumptions on the object shapes. Motivated by a case study involving the monitoring of egg-shaped trabecular structures, we investigate two recently-proposed methodologies to detect deviations from the nominal IC model caused by excess or lack of material. Our study focuses on the detection of both isolated large changes in the geometric structure, as well as persistent small deviations. We compare the approach of Scimone et al. (2022) with Zhao and del Castillo (2021) for monitoring defects in a small Phase I sample of 3D-printed objects. While the former control chart is able to detect large defects, the latter allows the detection of nonconforming objects with persistent small defects. Furthermore, we address the fundamental issue of selecting the number of eigenvalues to be monitored in Zhao and del Castillo's method by proposing a dimensionality reduction technique based on kernel principal components. This approach is shown to provide a good detection capability even when considering a large number of eigenvalues. By leveraging the sensitivity of the two monitoring schemes to different magnitudes of nonconformities, we also propose a novel joint monitoring scheme that is capable of identifying both types of defects in the considered case study. Computer code in R and Matlab that implements these methods and replicates the results is available as part of the supplementary material.Comment: 39 pages, 5 figures, 3 table

    Depth- and Potential-Based Supervised Learning

    Get PDF
    The task of supervised learning is to define a data-based rule by which the new objects are assigned to one of the classes. For this a training data set is used that contains objects with known class membership. In this thesis, two procedures for supervised classification are introduced. The first procedure is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials, similarly to the DD-plot. Separation of the classes, as well as classification of new data points, is performed on this plot, thus the bias in kernel density estimates due to insufficiently adapted multivariate kernels is compensated by a flexible classifier on the pot-pot plot. The proposed method has been implemented in the R-package ddalpha that is a software directed to fuse experience of the applicant with recent theoretical and computational achievements in the area of data depth and depth-based classification. It implements various depth functions and classifiers for multivariate and functional data under one roof. The package is expandable with user-defined custom depth methods and separators. The second classification procedure focuses on the centers of the classes and is based on data depth. The classifier adds a depth term to the objective function of the Bayes classifier, so that the cost of misclassification of a point depends not only on its belongingness to a class but also on its centrality in this class. Classification of more central points is enforced while outliers are underweighted. The proposed objective function may also be used to evaluate the performance of other classifiers instead of the usual average misclassification rate. The thesis also contains a new algorithm for the exact calculation of the Oja median. It modifies the algorithm of Ronkainen, Oja and Orponen (2003) by employing bounded regions which contain the median. The new algorithm is faster and has lower complexity than the previous one. The new algorithm has been implemented as a part of the R-package OjaNP

    PREDICTION OF 1P/19Q CODELETION STATUS IN DIFFUSE GLIOMA PATIENTS USING PREOPERATIVE MULTIPARAMETRIC MAGNETIC RESONANCE IMAGING

    Get PDF
    A complete codeletion of chromosome 1p/19q is strongly correlated with better overall survival of diffuse glioma patients, hence determining the codeletion status early in the course of a patient’s disease would be valuable in that patient’s care. The current practice requires a surgical biopsy in order to assess the codeletion status, which exposes patients to risks and is limited in its accuracy by sampling variations. To overcome such limitations, we utilized four conventional magnetic resonance imaging sequences to predict the 1p/19q status. We extracted three sets of image-derived features, namely texture-based, topology-based, and convolutional neural network (CNN)-based, and analyzed each feature’s prediction performance. The topology-based model (AUC = 0.855 +/- 0.079) performed significantly better compared to the texture-based model (AUC = 0.707 +/- 0.118) while comparably against the CNN-based model (0.787 +/- 0.195). However, none of the models performed better than the baseline model that is built with only clinical variables, namely, age, gender, and Karnofsky Performance Score (AUC = 0.703 +/- 0.256). In summary, predicting 1p/19q chromosome codeletion status via MRI scan analysis can be a viable non-invasive assessment tool at an early stage of gliomas and in follow-ups although further investigation is needed to improve the model performance
    • …
    corecore