Search CORE

251 research outputs found

Multivariate modality inference using Gaussian kernel

Author: Cheng Yansong
Ray Surajit
Publication venue: 'Scientific Research Publishing, Inc.'
Publication date: 01/01/2014
Field of study

The number of modes (also known as modality) of a kernel density estimator (KDE) draws lots of interests and is important in practice. In this paper, we develop an inference framework on the modality of a KDE under multivariate setting using Gaussian kernel. We applied the modal clustering method proposed by [1] for mode hunting. A test statistic and its asymptotic distribution are derived to assess the significance of each mode. The inference procedure is applied on both simulated and real data sets

Crossref

Enlighten

Functional principal component analysis of spatially correlated data

Author: Hooker Giles
Liu Chong
Ray Surajit
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper focuses on the analysis of spatially correlated functional data. We propose a parametric model for spatial correlation and the between-curve correlation is modeled by correlating functional principal component scores of the functional data. Additionally, in the sparse observation framework, we propose a novel approach of spatial principal analysis by conditional expectation to explicitly estimate spatial correlations and reconstruct individual curves. Assuming spatial stationarity, empirical spatial correlations are calculated as the ratio of eigenvalues of the smoothed covariance surface Cov (Xi(s),Xi(t))(Xi(s),Xi(t)) and cross-covariance surface Cov (Xi(s),Xj(t))(Xi(s),Xj(t)) at locations indexed by i and j. Then a anisotropy Matérn spatial correlation model is fitted to empirical correlations. Finally, principal component scores are estimated to reconstruct the sparsely observed curves. This framework can naturally accommodate arbitrary covariance structures, but there is an enormous reduction in computation if one can assume the separability of temporal and spatial components. We demonstrate the consistency of our estimates and propose hypothesis tests to examine the separability as well as the isotropy effect of spatial correlation. Using simulation studies, we show that these methods have some clear advantages over existing methods of curve reconstruction and estimation of model parameters

Crossref

Springer - Publisher Connector

Enlighten

The topography of multivariate normal mixtures

Author: Lindsay Bruce G.
Ray Surajit
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

Multivariate normal mixtures provide a flexible method of fitting high-dimensional data. It is shown that their topography, in the sense of their key features as a density, can be analyzed rigorously in lower dimensions by use of a ridgeline manifold that contains all critical points, as well as the ridges of the density. A plot of the elevations on the ridgeline shows the key features of the mixed density. In addition, by use of the ridgeline, we uncover a function that determines the number of modes of the mixed density when there are two components being mixed. A followup analysis then gives a curvature function that can be used to prove a set of modality theorems.Comment: Published at http://dx.doi.org/10.1214/009053605000000417 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Carolina Digital Repository

Enlighten

Functional factor analysis for periodic remote sensing data

Author: Friedl Mark
Hooker Giles
Liu Chong
Ray Surajit
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We present a new approach to factor rotation for functional data. This is achieved by rotating the functional principal components toward a predefined space of periodic functions designed to decompose the total variation into components that are nearly-periodic and nearly-aperiodic with a predefined period. We show that the factor rotation can be obtained by calculation of canonical correlations between appropriate spaces which make the methodology computationally efficient. Moreover, we demonstrate that our proposed rotations provide stable and interpretable results in the presence of highly complex covariance. This work is motivated by the goal of finding interpretable sources of variability in gridded time series of vegetation index measurements obtained from remote sensing, and we demonstrate our methodology through an application of factor rotation of this data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS518 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Enlighten

Quadratic distances on probabilities: A unified foundation

Author: Chen Shu-Chuan
Lindsay Bruce G.
Markatou Marianthi
Ray Surajit
Yang Ke
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

This work builds a unified framework for the study of quadratic form distance measures as they are used in assessing the goodness of fit of models. Many important procedures have this structure, but the theory for these methods is dispersed and incomplete. Central to the statistical analysis of these distances is the spectral decomposition of the kernel that generates the distance. We show how this determines the limiting distribution of natural goodness-of-fit tests. Additionally, we develop a new notion, the spectral degrees of freedom of the test, based on this decomposition. The degrees of freedom are easy to compute and estimate, and can be used as a guide in the construction of useful procedures in this class.Comment: Published in at http://dx.doi.org/10.1214/009053607000000956 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Enlighten

Signaling local non-credibility in an automatic segmentation pipeline

Author: Edward L. Chaney
Joshua H. Levy
Robert E. Broadhurst
Stephen M. Pizer
Surajit Ray
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2007
Field of study

The advancing technology for automatic segmentation of medical images should be accompanied by techniques to inform the user of the local credibility of results. To the extent that this technology produces clinically acceptable segmentations for a significant fraction of cases, there is a risk that the clinician will assume every result is acceptable. In the less frequent case where segmentation fails, we are concerned that unless the user is alerted by the computer, she would still put the result to clinical use. By alerting the user to the location of a likely segmentation failure, we allow her to apply limited validation and editing resources where they are most needed. We propose an automated method to signal suspected non-credible regions of the segmentation, triggered by statistical outliers of the local image match function. We apply this test to m-rep segmentations of the bladder and prostate in CT images using a local image match computed by PCA on regional intensity quantile functions. We validate these results by correlating the non-credible regions with regions that have surface distance greater than 5.5mm to a reference segmentation for the bladder. A 6mm surface distance was used to validate the prostate results. Varying the outlier threshold level produced a receiver operating characteristic with area under the curve of 0.89 for the bladder and 0.92 for the prostate. Based on this preliminary result, our method has been able to predict local segmentation failures and shows potential for validation in an automatic segmentation pipeline

Crossref

Enlighten

Analysis of PET Imaging for Tumor Delineation

Author: Ray Surajit
Publication venue
Publication date: 21/06/2019
Field of study

The primary goal of this is research is to build a statistical framework for automated PET image analysis that is closer to human perception. Although manual interpretation of the PET image is more accurate and reproducible than thresholding-based semiautomatic segmentation methods, human contouring has large interobserver and intraobserver variations and moreover, it is extremely time-consuming. Further, it is harder for humans to analyze more than two dimensions at a time and it becomes even harder if multiple modalities are involved. Moreover, if the task is to analyze a series of images it quickly becomes an onerous job for a single human. The new statistical framework is designed to mimic the human perception for tumour delineation and marry it with all the advan- tages of an analytic method using modern day computing environment

Enlighten

Analysis of PET Imaging for Tumor Delineation

Author: Ray Surajit
Publication venue
Publication date: 21/06/2019
Field of study

A computational framework to emulate the human perspective in flow cytometric data analysis

Author: AP Dempster
B Ellis
B Lindsay
BG Lindsay
BW Silverman
BW Silverman
C Jarque
Christopher V. Rao
D Novo
D Sarkar
DJ Marchette
DR Parks
E Choy
E Lugli
F Hahne
F Hahne
F Hahne
G Finak
G Finak
G Luta
G McLachlan
H Zare
J Li
J Trotter
JA Hartigan
JA Hartigan
JM Irish
JP Baudry
K Lo
L Herzenberg
LM Maier
MC Minnotte
MY Cheng
MY Cheng
MY Cheng
N Aghaeepour
PM Hartigan
R Scheuermann
R Tibshirani
RR Brinkman
S Pyne
S Ray
S Ray
Saumyadipta Pyne
Surajit Ray
T Lin
T Lin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation. <p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods. <p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Enlighten

The Francis Crick Institute

A New Framework for Distance-based Functional Clustering

Author: Al Alawi Maryam
Gupta Mayetri
Ray Surajit
Publication venue
Publication date: 07/07/2019
Field of study

We develop a new framework for clustering functional data, based on a distance matrix similar to the approach in clustering multivariate data using spectral clustering. First, we smooth the raw observations using appropriate smoothing techniques with desired smoothness, through a penalized fit. The next step is to create an optimal distance matrix either from the smoothed curves or their available derivatives. The choice of the distance matrix depends on the nature of the data. Finally, we create and implement the spectral clustering algorithm. We applied our newly developed approach, Functional Spectral Clustering (FSC) on sets of simulated and real data. Our proposed method showed better performance than existing methods with respect to accuracy rates