26,978 research outputs found
Detection of Sparse Positive Dependence
In a bivariate setting, we consider the problem of detecting a sparse
contamination or mixture component, where the effect manifests itself as a
positive dependence between the variables, which are otherwise independent in
the main component. We first look at this problem in the context of a normal
mixture model. In essence, the situation reduces to a univariate setting where
the effect is a decrease in variance. In particular, a higher criticism test
based on the pairwise differences is shown to achieve the detection boundary
defined by the (oracle) likelihood ratio test. We then turn to a Gaussian
copula model where the marginal distributions are unknown. Standard invariance
considerations lead us to consider rank tests. In fact, a higher criticism test
based on the pairwise rank differences achieves the detection boundary in the
normal mixture model, although not in the very sparse regime. We do not know of
any rank test that has any power in that regime
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity
We introduce a dependent Bayesian nonparametric model for the probabilistic
modeling of membership of subgroups in a community based on partially
replicated data. The focus here is on species-by-site data, i.e. community data
where observations at different sites are classified in distinct species. Our
aim is to study the impact of additional covariates, for instance environmental
variables, on the data structure, and in particular on the community diversity.
To that purpose, we introduce dependence a priori across the covariates, and
show that it improves posterior inference. We use a dependent version of the
Griffiths-Engen-McCloskey distribution defined via the stick-breaking
construction. This distribution is obtained by transforming a Gaussian process
whose covariance function controls the desired dependence. The resulting
posterior distribution is sampled by Markov chain Monte Carlo. We illustrate
the application of our model to a soil microbial dataset acquired across a
hydrocarbon contamination gradient at the site of a fuel spill in Antarctica.
This method allows for inference on a number of quantities of interest in
ecotoxicology, such as diversity or effective concentrations, and is broadly
applicable to the general problem of communities response to environmental
variables.Comment: Main Paper: 22 pages, 6 figures. Supplementary Material: 11 pages, 1
figur
Detecting Variability in Massive Astronomical Time-Series Data I: application of an infinite Gaussian mixture model
We present a new framework to detect various types of variable objects within
massive astronomical time-series data. Assuming that the dominant population of
objects is non-variable, we find outliers from this population by using a
non-parametric Bayesian clustering algorithm based on an infinite
GaussianMixtureModel (GMM) and the Dirichlet Process. The algorithm extracts
information from a given dataset, which is described by six variability
indices. The GMM uses those variability indices to recover clusters that are
described by six-dimensional multivariate Gaussian distributions, allowing our
approach to consider the sampling pattern of time-series data, systematic
biases, the number of data points for each light curve, and photometric
quality. Using the Northern Sky Variability Survey data, we test our approach
and prove that the infinite GMM is useful at detecting variable objects, while
providing statistical inference estimation that suppresses false detection. The
proposed approach will be effective in the exploration of future surveys such
as GAIA, Pan-Starrs, and LSST, which will produce massive time-series data.Comment: accepted for publication in MNRA
Distributed Nonparametric Sequential Spectrum Sensing under Electromagnetic Interference
A nonparametric distributed sequential algorithm for quick detection of
spectral holes in a Cognitive Radio set up is proposed. Two or more local nodes
make decisions and inform the fusion centre (FC) over a reporting Multiple
Access Channel (MAC), which then makes the final decision. The local nodes use
energy detection and the FC uses mean detection in the presence of fading,
heavy-tailed electromagnetic interference (EMI) and outliers. The statistics of
the primary signal, channel gain or the EMI is not known. Different
nonparametric sequential algorithms are compared to choose appropriate
algorithms to be used at the local nodes and the FC. Modification of a recently
developed random walk test is selected for the local nodes for energy detection
as well as at the fusion centre for mean detection. It is shown via simulations
and analysis that the nonparametric distributed algorithm developed performs
well in the presence of fading, EMI and is robust to outliers. The algorithm is
iterative in nature making the computation and storage requirements minimal.Comment: 8 pages; 6 figures; Version 2 has the proofs for the theorems.
Version 3 contains a new section on approximation analysi
Profile control charts based on nonparametric -1 regression methods
Classical statistical process control often relies on univariate
characteristics. In many contemporary applications, however, the quality of
products must be characterized by some functional relation between a response
variable and its explanatory variables. Monitoring such functional profiles has
been a rapidly growing field due to increasing demands. This paper develops a
novel nonparametric -1 location-scale model to screen the shapes of
profiles. The model is built on three basic elements: location shifts, local
shape distortions, and overall shape deviations, which are quantified by three
individual metrics. The proposed approach is applied to the previously analyzed
vertical density profile data, leading to some interesting insights.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS501 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …