27 research outputs found
On inference for modularity statistics in structured networks
This paper revisits the classical concept of network modularity and its
spectral relaxations used throughout graph data analysis. We formulate and
study several modularity statistic variants for which we establish asymptotic
distributional results in the large-network limit for networks exhibiting nodal
community structure. Our work facilitates testing for network differences and
can be used in conjunction with existing theoretical guarantees for stochastic
blockmodel random graphs. Our results are enabled by recent advances in the
study of low-rank truncations of large network adjacency matrices. We provide
confirmatory simulation studies and real data analysis pertaining to the
network neuroscience study of psychosis, specifically schizophrenia.
Collectively, this paper contributes to the limited existing literature to date
on statistical inference for modularity-based network analysis. Supplemental
materials for this article are available online.Comment: 47 pages with supplemen
Statistical Analysis and Spectral Methods for Signal-Plus-Noise Matrix Models
The singular value matrix decomposition plays a ubiquitous role in statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and understanding the geometric structure of singular values and singular vectors.
Chapter 2 of this dissertation presents an initial analysis of local (e.g., entrywise) singular vector (resp., eigenvector) perturbations for signal-plus-noise matrix models. We obtain both deterministic and probabilistic upper bounds on singular vector perturbations that complement and in certain settings improve upon classical, well-established benchmark bounds in the literature. We then apply our tools and methods of analysis to problems involving (spike) principal subspace estimation for high-dimensional covariance matrices and network models exhibiting community structure. Subsequently, Chapter 3 obtains precise local eigenvector estimation results under stronger assumptions involving signal strength, probabilistic concentration, and homogeneity. We provide in silico simulation examples to illustrate our theoretical bounds and distributional limit theory. Chapter 4 transitions to the investigation of singular value (resp., eigenvalue) perturbations, still in the signal-plus-noise matrix model framework. There, our results are leveraged for the purpose of better understanding hypothesis testing and change-point detection in statistical random graph analysis. Chapter 5 builds upon recent joint analysis of singular (resp., eigen) values and vectors in order to investigate the asymptotic relationship between spectral embedding performance and underlying network structure for stochastic block model graphs
Comparing Machine Learning and Logistic Regression Methods for Predicting Hypertension Using a Combination of Gene Expression and Next-Generation Sequencing Data
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data
First Encounters with Option Pricing and Return Simulation
We provide a tractable introduction to option pricing models and examine how the complex analysis concept of branch-cutting influences financial mathematics. The Black-Scholes model is introduced to motivate our discussion of the Heston stochastic volatility model, a model which dominates industry and option pricing literature in financial mathematics. We focus on developing mathematical intuition as a tool for stimulating further undergraduate interest and research in financial mathematics. We provide code in R and Mathematica for applications