251 research outputs found
Multivariate modality inference using Gaussian kernel
The number of modes (also known as modality) of a kernel density estimator (KDE) draws lots of interests and is important in practice. In this paper, we develop an inference framework on the modality of a KDE under multivariate setting using Gaussian kernel. We applied the modal clustering method proposed by [1] for mode hunting. A test statistic and its asymptotic distribution are derived to assess the significance of each mode. The inference procedure is applied on both simulated and real data sets
Functional principal component analysis of spatially correlated data
This paper focuses on the analysis of spatially correlated functional data. We propose a parametric model for spatial correlation and the between-curve correlation is modeled by correlating functional principal component scores of the functional data. Additionally, in the sparse observation framework, we propose a novel approach of spatial principal analysis by conditional expectation to explicitly estimate spatial correlations and reconstruct individual curves. Assuming spatial stationarity, empirical spatial correlations are calculated as the ratio of eigenvalues of the smoothed covariance surface Cov (Xi(s),Xi(t))(Xi(s),Xi(t)) and cross-covariance surface Cov (Xi(s),Xj(t))(Xi(s),Xj(t)) at locations indexed by i and j. Then a anisotropy Matérn spatial correlation model is fitted to empirical correlations. Finally, principal component scores are estimated to reconstruct the sparsely observed curves. This framework can naturally accommodate arbitrary covariance structures, but there is an enormous reduction in computation if one can assume the separability of temporal and spatial components. We demonstrate the consistency of our estimates and propose hypothesis tests to examine the separability as well as the isotropy effect of spatial correlation. Using simulation studies, we show that these methods have some clear advantages over existing methods of curve reconstruction and estimation of model parameters
The topography of multivariate normal mixtures
Multivariate normal mixtures provide a flexible method of fitting
high-dimensional data. It is shown that their topography, in the sense of their
key features as a density, can be analyzed rigorously in lower dimensions by
use of a ridgeline manifold that contains all critical points, as well as the
ridges of the density. A plot of the elevations on the ridgeline shows the key
features of the mixed density. In addition, by use of the ridgeline, we uncover
a function that determines the number of modes of the mixed density when there
are two components being mixed. A followup analysis then gives a curvature
function that can be used to prove a set of modality theorems.Comment: Published at http://dx.doi.org/10.1214/009053605000000417 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Functional factor analysis for periodic remote sensing data
We present a new approach to factor rotation for functional data. This is
achieved by rotating the functional principal components toward a predefined
space of periodic functions designed to decompose the total variation into
components that are nearly-periodic and nearly-aperiodic with a predefined
period. We show that the factor rotation can be obtained by calculation of
canonical correlations between appropriate spaces which make the methodology
computationally efficient. Moreover, we demonstrate that our proposed rotations
provide stable and interpretable results in the presence of highly complex
covariance. This work is motivated by the goal of finding interpretable sources
of variability in gridded time series of vegetation index measurements obtained
from remote sensing, and we demonstrate our methodology through an application
of factor rotation of this data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS518 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Quadratic distances on probabilities: A unified foundation
This work builds a unified framework for the study of quadratic form distance
measures as they are used in assessing the goodness of fit of models. Many
important procedures have this structure, but the theory for these methods is
dispersed and incomplete. Central to the statistical analysis of these
distances is the spectral decomposition of the kernel that generates the
distance. We show how this determines the limiting distribution of natural
goodness-of-fit tests. Additionally, we develop a new notion, the spectral
degrees of freedom of the test, based on this decomposition. The degrees of
freedom are easy to compute and estimate, and can be used as a guide in the
construction of useful procedures in this class.Comment: Published in at http://dx.doi.org/10.1214/009053607000000956 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Signaling local non-credibility in an automatic segmentation pipeline
The advancing technology for automatic segmentation of medical images should be accompanied by techniques to inform the user of the local credibility of results. To the extent that this technology produces clinically acceptable segmentations for a significant fraction of cases, there is a risk that the clinician will assume every result is acceptable. In the less frequent case where segmentation fails, we are concerned that unless the user is alerted by the computer, she would still put the result to clinical use. By alerting the user to the location of a likely segmentation failure, we allow her to apply limited validation and editing resources where they are most needed. We propose an automated method to signal suspected non-credible regions of the segmentation, triggered by statistical outliers of the local image match function. We apply this test to m-rep segmentations of the bladder and prostate in CT images using a local image match computed by PCA on regional intensity quantile functions. We validate these results by correlating the non-credible regions with regions that have surface distance greater than 5.5mm to a reference segmentation for the bladder. A 6mm surface distance was used to validate the prostate results. Varying the outlier threshold level produced a receiver operating characteristic with area under the curve of 0.89 for the bladder and 0.92 for the prostate. Based on this preliminary result, our method has been able to predict local segmentation failures and shows potential for validation in an automatic segmentation pipeline
Analysis of PET Imaging for Tumor Delineation
The primary goal of this is research is to build a statistical framework for automated PET image analysis that is closer to human perception. Although manual interpretation of the PET image is more accurate and reproducible than thresholding-based semiautomatic segmentation methods, human contouring has large interobserver and intraobserver variations and moreover, it is extremely time-consuming. Further, it is harder for humans to analyze more than two dimensions at a time and it becomes even harder if multiple modalities are involved. Moreover, if the task is to analyze a series of images it quickly becomes an onerous job for a single human. The new statistical framework is designed to mimic the human perception for tumour delineation and marry it with all the advan- tages of an analytic method using modern day computing environment
Analysis of PET Imaging for Tumor Delineation
The primary goal of this is research is to build a statistical framework for automated PET image analysis that is closer to human perception. Although manual interpretation of the PET image is more accurate and reproducible than thresholding-based semiautomatic segmentation methods, human contouring has large interobserver and intraobserver variations and moreover, it is extremely time-consuming. Further, it is harder for humans to analyze more than two dimensions at a time and it becomes even harder if multiple modalities are involved. Moreover, if the task is to analyze a series of images it quickly becomes an onerous job for a single human. The new statistical framework is designed to mimic the human perception for tumour delineation and marry it with all the advan- tages of an analytic method using modern day computing environment
A computational framework to emulate the human perspective in flow cytometric data analysis
Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation.
<p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods.
<p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics
A New Framework for Distance-based Functional Clustering
We develop a new framework for clustering functional data, based on a distance matrix similar to the approach in clustering multivariate data using spectral clustering. First, we smooth the raw observations using appropriate smoothing techniques with desired smoothness, through a penalized fit. The next step is to create an optimal distance matrix either from the smoothed curves or their available derivatives. The choice of the distance matrix depends on the nature of the data. Finally, we create and implement the spectral clustering algorithm. We applied our newly developed approach, Functional Spectral Clustering (FSC) on sets of simulated and real data. Our proposed method showed better performance than existing methods with respect to accuracy rates
- …
