5 research outputs found
A nonparametric two-sample hypothesis testing problem for random dot product graphs
We consider the problem of testing whether two finite-dimensional random dot
product graphs have generating latent positions that are independently drawn
from the same distribution, or distributions that are related via scaling or
projection. We propose a test statistic that is a kernel-based function of the
adjacency spectral embedding for each graph. We obtain a limiting distribution
for our test statistic under the null and we show that our test procedure is
consistent across a broad range of alternatives.Comment: 24 pages, 1 figure
Matching and Inference for Multiple Correlated Data Sets
Given multiple correlated data sets, an important question is how to make use of them to benefit later statistical inference. This is a realistic setting in the modern world as more and more related data sets are collected, say images and their descriptions, articles in multiple languages, actors in multiple social networks; and real data are often multivariate or high-dimensional such that dimension reduction is necessary before any inference.
In this dissertation, I consider three dimension reduction and matching methods, namely principal component analysis followed by Procrustes matching, canonical correlation analysis, and nonlinear matching using shortest-path distance and joint neighborhood. I investigate their theoretical properties and their impact on later inference using the Procrustes fitting error, classification error, and hypothesis testing respectively.
The main conclusion of this dissertation is that given a particular inference task for multiple correlated data sets, we may significantly improve the inference performance by joint matching and projection, compared to separate projection or omitting modalities. Numerical experiments are provided to illustrate the theorems and the methodology using simulated data and real data