65 research outputs found
Adaptive lasso and Dantzig selector for spatial point processes intensity estimation
Lasso and Dantzig selector are standard procedures able to perform variable
selection and estimation simultaneously. This paper is concerned with extending
these procedures to spatial point process intensity estimation. We propose
adaptive versions of these procedures, develop efficient computational
methodologies and derive asymptotic results for a large class of spatial point
processes under an original setting where the number of parameters, i.e. the
number of spatial covariates considered, increases with the expected number of
data points. Both procedures are compared theoretically, in a simulation study,
and in a real data example.Comment: 30 page
Nonconcave penalized composite conditional likelihood estimation of sparse Ising models
The Ising model is a useful tool for studying complex interactions within a
system. The estimation of such a model, however, is rather challenging,
especially in the presence of high-dimensional parameters. In this work, we
propose efficient procedures for learning a sparse Ising model based on a
penalized composite conditional likelihood with nonconcave penalties.
Nonconcave penalized likelihood estimation has received a lot of attention in
recent years. However, such an approach is computationally prohibitive under
high-dimensional Ising models. To overcome such difficulties, we extend the
methodology and theory of nonconcave penalized likelihood to penalized
composite conditional likelihood estimation. The proposed method can be
efficiently implemented by taking advantage of coordinate-ascent and
minorization--maximization principles. Asymptotic oracle properties of the
proposed method are established with NP-dimensionality. Optimality of the
computed local solution is discussed. We demonstrate its finite sample
performance via simulation studies and further illustrate our proposal by
studying the Human Immunodeficiency Virus type 1 protease structure based on
data from the Stanford HIV drug resistance database. Our statistical learning
results match the known biological findings very well, although no prior
biological information is used in the data analysis procedure.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1017 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A nonparametric penalized likelihood approach to density estimation of space–time point patterns
In this work, we consider space-time point processes and study their continuous space-time evolution. We propose an innovative nonparametric methodology to estimate the unknown space-time density of the point pattern, or, equivalently, to estimate the intensity of an inhomogeneous space-time Poisson point process. The presented approach combines maximum likelihood estimation with roughness penalties, based on differential operators, defined over the spatial and temporal domains of interest. We first establish some important theoretical properties of the considered estimator, including its consistency. We then develop an efficient and flexible estimation procedure that leverages advanced numerical and computation techniques. Thanks to a discretization based on finite elements in space and B-splines in time, the proposed method can effectively capture complex multi-modal and strongly anisotropic spatio-temporal point patterns; moreover, these point patterns may be observed over planar or curved domains with non -trivial geometries, due to geographic constraints, such as coastal regions with complicated shorelines, or curved regions with complex orography. In addition to providing estimates, the method's functionalities also include the introduction of appropriate uncertainty quantification tools. We thoroughly validate the proposed method, by means of simulation studies and applications to real-world data. The obtained results highlight significant advantages over state-of-the-art competing approaches
Information criteria for inhomogeneous spatial point processes
The theoretical foundation for a number of model selection criteria is
established in the context of inhomogeneous point processes and under various
asymptotic settings: infill, increasing domain, and combinations of these. For
inhomogeneous Poisson processes we consider Akaike information criterion and
the Bayesian information criterion, and in particular we identify the point
process analogue of sample size needed for the Bayesian information criterion.
Considering general inhomogeneous point processes we derive new composite
likelihood and composite Bayesian information criteria for selecting a
regression model for the intensity function. The proposed model selection
criteria are evaluated using simulations of Poisson processes and cluster point
processes.Comment: 6 figure
Statistical Inference for Structured Spatial and Temporal Point Data
The availability of large-scale spatial and temporal data has fueled increasing interest in statistical modelling and analysis. With the recent development of data collection and data storage techniques, the observation scopes can sometimes involve a extremely vast range or an explosive amount of cases. Then this always leads to an inevitable focus that there tend to be some heterogeneous properties among observations. Thus, the research was conducted to explain the variability in spatial or temporal data considering the correlation of observations.
We first considered the intensity estimation problem for large spatial point patterns on complex domains in R2 (e.g., domains with irregular boundaries, sharp concavities, and/or interior holes due to geographic constraints) and linear networks, where many existing spatial point process models suffer from the problems of “leakage" and computation. We proposed an efficient intensity estimation algorithm to estimate the spatially varying intensity function and to study the varying relationship between intensity and explanatory variables on complex domains. The method is built upon a graph regularization technique and hence can be flexibly applied to point patterns on complex domains such as regions with irregular boundaries and holes, or linear networks. An efficient proximal gradient optimization algorithm is proposed to handle large spatial point patterns. Numerical studies were conducted to illustrate the performance of the method. Besides, we apply the method to study and visualize the intensity patterns of the accidents on the Western Australia road network, and the spatial variations in the effects of income, lights condition, and population density on the Toronto homicides occurrences.
In addition, the spatial inhomogeneity occurred in various scenarios, especially for the data laying in a vast-scale space. we further established a spatially adaptive sampling design approach based in an estimation of the spatially varying underlying contamination distribution. This part of research was motivated by an Arsenic exposure data which were collected through drinking water in private wells across the Iowa state. From the public and environmental health management perspective, it is critical to allocate the limited resources to establish an effective arsenic sampling and testing plan for health risk mitigation. we propose a statistical regularization method to automatically detect spatial clusters of the underlying contamination risk from the currently available private well arsenic testing data in the USA, Iowa. This approach allows us to develop a sampling design method that is adaptive to the changes in the contamination risk across the identified clusters.
Finally, we further looked into the cluster issues in structured temporal point data. How to cluster event sequences from heterogeneous point processes is a challenging task, especially when event sequences are repeatedly observed and associated with multiple event types. To solve this problem, we proposed an efficient model-based clustering framework, based on a novel multivariate mixture of functional point processes (MFPP). The proposed model generated event sequences from a multi-level log-Gaussian Cox process, which allows to uncover complex inner patterns among sequences, by imposing multiple latent random effects. We prove the identifiability of our mixture model and developed an effective semi-parametric Exponential-Solution (ES) algorithm to the proposed model. The effectiveness of the proposed framework is demonstrated through simulation studies and real data analyses
- …