26,978 research outputs found

    Detection of Sparse Positive Dependence

    Full text link
    In a bivariate setting, we consider the problem of detecting a sparse contamination or mixture component, where the effect manifests itself as a positive dependence between the variables, which are otherwise independent in the main component. We first look at this problem in the context of a normal mixture model. In essence, the situation reduces to a univariate setting where the effect is a decrease in variance. In particular, a higher criticism test based on the pairwise differences is shown to achieve the detection boundary defined by the (oracle) likelihood ratio test. We then turn to a Gaussian copula model where the marginal distributions are unknown. Standard invariance considerations lead us to consider rank tests. In fact, a higher criticism test based on the pairwise rank differences achieves the detection boundary in the normal mixture model, although not in the very sparse regime. We do not know of any rank test that has any power in that regime

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity

    Get PDF
    We introduce a dependent Bayesian nonparametric model for the probabilistic modeling of membership of subgroups in a community based on partially replicated data. The focus here is on species-by-site data, i.e. community data where observations at different sites are classified in distinct species. Our aim is to study the impact of additional covariates, for instance environmental variables, on the data structure, and in particular on the community diversity. To that purpose, we introduce dependence a priori across the covariates, and show that it improves posterior inference. We use a dependent version of the Griffiths-Engen-McCloskey distribution defined via the stick-breaking construction. This distribution is obtained by transforming a Gaussian process whose covariance function controls the desired dependence. The resulting posterior distribution is sampled by Markov chain Monte Carlo. We illustrate the application of our model to a soil microbial dataset acquired across a hydrocarbon contamination gradient at the site of a fuel spill in Antarctica. This method allows for inference on a number of quantities of interest in ecotoxicology, such as diversity or effective concentrations, and is broadly applicable to the general problem of communities response to environmental variables.Comment: Main Paper: 22 pages, 6 figures. Supplementary Material: 11 pages, 1 figur

    Detecting Variability in Massive Astronomical Time-Series Data I: application of an infinite Gaussian mixture model

    Full text link
    We present a new framework to detect various types of variable objects within massive astronomical time-series data. Assuming that the dominant population of objects is non-variable, we find outliers from this population by using a non-parametric Bayesian clustering algorithm based on an infinite GaussianMixtureModel (GMM) and the Dirichlet Process. The algorithm extracts information from a given dataset, which is described by six variability indices. The GMM uses those variability indices to recover clusters that are described by six-dimensional multivariate Gaussian distributions, allowing our approach to consider the sampling pattern of time-series data, systematic biases, the number of data points for each light curve, and photometric quality. Using the Northern Sky Variability Survey data, we test our approach and prove that the infinite GMM is useful at detecting variable objects, while providing statistical inference estimation that suppresses false detection. The proposed approach will be effective in the exploration of future surveys such as GAIA, Pan-Starrs, and LSST, which will produce massive time-series data.Comment: accepted for publication in MNRA

    Distributed Nonparametric Sequential Spectrum Sensing under Electromagnetic Interference

    Full text link
    A nonparametric distributed sequential algorithm for quick detection of spectral holes in a Cognitive Radio set up is proposed. Two or more local nodes make decisions and inform the fusion centre (FC) over a reporting Multiple Access Channel (MAC), which then makes the final decision. The local nodes use energy detection and the FC uses mean detection in the presence of fading, heavy-tailed electromagnetic interference (EMI) and outliers. The statistics of the primary signal, channel gain or the EMI is not known. Different nonparametric sequential algorithms are compared to choose appropriate algorithms to be used at the local nodes and the FC. Modification of a recently developed random walk test is selected for the local nodes for energy detection as well as at the fusion centre for mean detection. It is shown via simulations and analysis that the nonparametric distributed algorithm developed performs well in the presence of fading, EMI and is robust to outliers. The algorithm is iterative in nature making the computation and storage requirements minimal.Comment: 8 pages; 6 figures; Version 2 has the proofs for the theorems. Version 3 contains a new section on approximation analysi

    Profile control charts based on nonparametric LL-1 regression methods

    Full text link
    Classical statistical process control often relies on univariate characteristics. In many contemporary applications, however, the quality of products must be characterized by some functional relation between a response variable and its explanatory variables. Monitoring such functional profiles has been a rapidly growing field due to increasing demands. This paper develops a novel nonparametric LL-1 location-scale model to screen the shapes of profiles. The model is built on three basic elements: location shifts, local shape distortions, and overall shape deviations, which are quantified by three individual metrics. The proposed approach is applied to the previously analyzed vertical density profile data, leading to some interesting insights.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS501 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore