269 research outputs found

    Optimal Methods for Using Posterior Probabilities in Association Testing

    Get PDF
    Objective: The use of haplotypes to impute the genotypes of unmeasured single nucleotide variants continues to rise in popularity. Simulation results suggest that the use of the dosage as a one-dimensional summary statistic of imputation posterior probabilities may be optimal both in terms of statistical power and computational efficiency; however, little theoretical understanding is available to explain and unify these simulation results. In our analysis, we provide a theoretical foundation for the use of the dosage as a one-dimensional summary statistic of genotype posterior probabilities from any technology. Methods: We analytically evaluate the dosage, mode and the more general set of all one-dimensional summary statistics of two-dimensional (three posterior probabilities that must sum to 1) genotype posterior probability vectors. Results: We prove that the dosage is an optimal one-dimensional summary statistic under a typical linear disease model and is robust to violations of this model. Simulation results confirm our theoretical findings. Conclusions: Our analysis provides a strong theoretical basis for the use of the dosage as a one-dimensional summary statistic of genotype posterior probability vectors in related tests of genetic association across a wide variety of genetic disease models

    Taming Nonconvexity in Kernel Feature Selection---Favorable Properties of the Laplace Kernel

    Full text link
    Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical properties of the \emph{global optima}, which is a mismatch, given that the gradient-based algorithms available for nonconvex optimization are only able to guarantee convergence to local minima. Studying the full landscape associated with kernel-based methods, we show that feature selection objectives using the Laplace kernel (and other ā„“1\ell_1 kernels) come with statistical guarantees that other kernels, including the ubiquitous Gaussian kernel (or other ā„“2\ell_2 kernels) do not possess. Based on a sharp characterization of the gradient of the objective function, we show that ā„“1\ell_1 kernels eliminate unfavorable stationary points that appear when using an ā„“2\ell_2 kernel. Armed with this insight, we establish statistical guarantees for ā„“1\ell_1 kernel-based feature selection which do not require reaching the global minima. In particular, we establish model-selection consistency of ā„“1\ell_1-kernel-based feature selection in recovering main effects and hierarchical interactions in the nonparametric setting with nāˆ¼logā”pn \sim \log p samples.Comment: 33 pages main text

    Kernel Learning in Ridge Regression "Automatically" Yields Exact Low Rank Solution

    Full text link
    We consider kernels of the form (x,xā€²)ā†¦Ļ•(āˆ„xāˆ’xā€²āˆ„Ī£2)(x,x') \mapsto \phi(\|x-x'\|^2_\Sigma) parametrized by Ī£\Sigma. For such kernels, we study a variant of the kernel ridge regression problem which simultaneously optimizes the prediction function and the parameter Ī£\Sigma of the reproducing kernel Hilbert space. The eigenspace of the Ī£\Sigma learned from this kernel ridge regression problem can inform us which directions in covariate space are important for prediction. Assuming that the covariates have nonzero explanatory power for the response only through a low dimensional subspace (central mean subspace), we find that the global minimizer of the finite sample kernel learning objective is also low rank with high probability. More precisely, the rank of the minimizing Ī£\Sigma is with high probability bounded by the dimension of the central mean subspace. This phenomenon is interesting because the low rankness property is achieved without using any explicit regularization of Ī£\Sigma, e.g., nuclear norm penalization. Our theory makes correspondence between the observed phenomenon and the notion of low rank set identifiability from the optimization literature. The low rankness property of the finite sample solutions exists because the population kernel learning objective grows "sharply" when moving away from its minimizers in any direction perpendicular to the central mean subspace.Comment: Add code links and correct a figur

    Geometric Framework for Evaluating Rare Variant Tests of Association

    Get PDF
    The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers

    Powerful Method for Including Genotype Uncertainty in Tests of Hardy-Weinberg Equilibrium

    Get PDF
    The use of posterior probabilities to summarize genotype uncertainty is pervasive across genotype, sequencing and imputation platforms. Prior work in many contexts has shown the utility of incorporating genotype uncertainty (posterior probabilities) in downstream statistical tests. Typical approaches to incorporating genotype uncertainty when testing Hardy-Weinberg equilibrium tend to lack calibration in the type I error rate, especially as genotype uncertainty increases. We propose a new approach in the spirit of genomic control that properly calibrates the type I error rate, while yielding improved power to detect deviations from Hardy-Weinberg Equilibrium. We demonstrate the improved performance of our method on both simulated and real genotypes

    Rorc restrains the potency of ST2+ regulatory T cells in ameliorating intestinal graft-versus-host disease

    Get PDF
    Soluble stimulation-2 (ST2) is increased during graft-versus-host disease (GVHD), while Tregs that express ST2 prevent GVHD through unknown mechanisms. Transplantation of Foxp3- T cells and Tregs that were collected and sorted from different Foxp3 reporter mice indicated that in mice that developed GVHD, ST2+ Tregs were thymus derived and predominantly localized to the intestine. ST2-/- Treg transplantation was associated with reduced total intestinal Treg frequency and activation. ST2-/- versus WT intestinal Treg transcriptomes showed decreased Treg functional markers and, reciprocally, increased Rorc expression. Rorc-/- T cells transplantation enhanced the frequency and function of intestinal ST2+ Tregs and reduced GVHD through decreased gut-infiltrating soluble ST2-producing type 1 and increased IL-4/IL-10-producing type 2 T cells. Cotransfer of ST2+ Tregs sorted from Rorc-/- mice with WT CD25-depleted T cells decreased GVHD severity and mortality, increased intestinal ST2+KLRG1+ Tregs, and decreased type 1 T cells after transplantation, indicating an intrinsic mechanism. Ex vivo IL-33-stimulated Tregs (TregIL-33) expressed higher amphiregulin and displayed better immunosuppression, and adoptive transfer prevented GVHD better than control Tregs or TregIL-33 cultured with IL-23/IL-17. Amphiregulin blockade by neutralizing antibody in vivo abolished the protective effect of TregIL-33. Our data show that inverse expression of ST2 and RORĪ³t in intestinal Tregs determines GVHD and that TregIL-33 has potential as a cellular therapy avenue for preventing GVHD

    Hierarchical accompanying and inhibiting patterns on the spatial arrangement of taxis' local hotspots

    Full text link
    Due to the large volume of recording, the complete spontaneity, and the flexible pick-up and drop-off locations, taxi data portrays a realistic and detailed picture of urban space use to a certain extent. The spatial arrangement of pick-up and drop-off hotspots reflects the organizational space, which has received attention in urban structure studies. Previous studies mainly explore the hotspots at a large scale by visual analysis or some simple indexes, where the hotspots usually cover the entire central business district, train stations, or dense residential areas, reaching a radius of hundreds or even thousands of meters. However, the spatial arrangement patterns of small-scale hotspots, reflecting the specific popular pick-up and drop-off locations, have not received much attention. Using two taxi trajectory datasets in Wuhan and Beijing, China, this study quantitatively explores the spatial arrangement of fine-grained pick-up and drop-off local hotspots with different levels of popularity, where the sizes are adaptively set as 90m*90m in Wuhan and 105m*105m in Beijing according to the local hotspot identification method. Results show that popular hotspots tend to be surrounded by less popular hotspots, but the existence of less popular hotspots is inhibited in regions with a large number of popular hotspots. We use the terms hierarchical accompany and inhibiting patterns for these two spatial configurations. Finally, to uncover the underlying mechanism, a KNN-based model is proposed to reproduce the spatial distribution of other less popular hotspots according to the most popular ones. These findings help decision-makers construct reasonable urban minimum units for precise traffic and disease control, as well as plan a more humane spatial arrangement of points of interest
    • ā€¦
    corecore