27 research outputs found

    On inference for modularity statistics in structured networks

    Full text link
    This paper revisits the classical concept of network modularity and its spectral relaxations used throughout graph data analysis. We formulate and study several modularity statistic variants for which we establish asymptotic distributional results in the large-network limit for networks exhibiting nodal community structure. Our work facilitates testing for network differences and can be used in conjunction with existing theoretical guarantees for stochastic blockmodel random graphs. Our results are enabled by recent advances in the study of low-rank truncations of large network adjacency matrices. We provide confirmatory simulation studies and real data analysis pertaining to the network neuroscience study of psychosis, specifically schizophrenia. Collectively, this paper contributes to the limited existing literature to date on statistical inference for modularity-based network analysis. Supplemental materials for this article are available online.Comment: 47 pages with supplemen

    Statistical Analysis and Spectral Methods for Signal-Plus-Noise Matrix Models

    Get PDF
    The singular value matrix decomposition plays a ubiquitous role in statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and understanding the geometric structure of singular values and singular vectors. Chapter 2 of this dissertation presents an initial analysis of local (e.g., entrywise) singular vector (resp., eigenvector) perturbations for signal-plus-noise matrix models. We obtain both deterministic and probabilistic upper bounds on singular vector perturbations that complement and in certain settings improve upon classical, well-established benchmark bounds in the literature. We then apply our tools and methods of analysis to problems involving (spike) principal subspace estimation for high-dimensional covariance matrices and network models exhibiting community structure. Subsequently, Chapter 3 obtains precise local eigenvector estimation results under stronger assumptions involving signal strength, probabilistic concentration, and homogeneity. We provide in silico simulation examples to illustrate our theoretical bounds and distributional limit theory. Chapter 4 transitions to the investigation of singular value (resp., eigenvalue) perturbations, still in the signal-plus-noise matrix model framework. There, our results are leveraged for the purpose of better understanding hypothesis testing and change-point detection in statistical random graph analysis. Chapter 5 builds upon recent joint analysis of singular (resp., eigen) values and vectors in order to investigate the asymptotic relationship between spectral embedding performance and underlying network structure for stochastic block model graphs

    Comparing Machine Learning and Logistic Regression Methods for Predicting Hypertension Using a Combination of Gene Expression and Next-Generation Sequencing Data

    Get PDF
    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data

    First Encounters with Option Pricing and Return Simulation

    Get PDF
    We provide a tractable introduction to option pricing models and examine how the complex analysis concept of branch-cutting influences financial mathematics. The Black-Scholes model is introduced to motivate our discussion of the Heston stochastic volatility model, a model which dominates industry and option pricing literature in financial mathematics. We focus on developing mathematical intuition as a tool for stimulating further undergraduate interest and research in financial mathematics. We provide code in R and Mathematica for applications
    corecore