Search CORE

277 research outputs found

Optimal Methods for Using Posterior Probabilities in Association Testing

Author: Liu Keli
Luedtke Alexander
Tintle Nathan L.
Publication venue: Digital Collections @ Dordt
Publication date: 01/01/2013
Field of study

Objective: The use of haplotypes to impute the genotypes of unmeasured single nucleotide variants continues to rise in popularity. Simulation results suggest that the use of the dosage as a one-dimensional summary statistic of imputation posterior probabilities may be optimal both in terms of statistical power and computational efficiency; however, little theoretical understanding is available to explain and unify these simulation results. In our analysis, we provide a theoretical foundation for the use of the dosage as a one-dimensional summary statistic of genotype posterior probabilities from any technology. Methods: We analytically evaluate the dosage, mode and the more general set of all one-dimensional summary statistics of two-dimensional (three posterior probabilities that must sum to 1) genotype posterior probability vectors. Results: We prove that the dosage is an optimal one-dimensional summary statistic under a typical linear disease model and is robust to violations of this model. Simulation results confirm our theoretical findings. Conclusions: Our analysis provides a strong theoretical basis for the use of the dosage as a one-dimensional summary statistic of genotype posterior probability vectors in related tests of genetic association across a wide variety of genetic disease models

Crossref

Dordt College

PubMed Central

Taming Nonconvexity in Kernel Feature Selection---Favorable Properties of the Laplace Kernel

Author: Jordan Michael I.
Liu Keli
Ruan Feng
Publication venue
Publication date: 17/06/2021
Field of study

Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical properties of the \emph{global optima}, which is a mismatch, given that the gradient-based algorithms available for nonconvex optimization are only able to guarantee convergence to local minima. Studying the full landscape associated with kernel-based methods, we show that feature selection objectives using the Laplace kernel (and other

\ell_1

kernels) come with statistical guarantees that other kernels, including the ubiquitous Gaussian kernel (or other

\ell_2

kernels) do not possess. Based on a sharp characterization of the gradient of the objective function, we show that

\ell_1

kernels eliminate unfavorable stationary points that appear when using an

\ell_2

kernel. Armed with this insight, we establish statistical guarantees for

\ell_1

kernel-based feature selection which do not require reaching the global minima. In particular, we establish model-selection consistency of

\ell_1

-kernel-based feature selection in recovering main effects and hierarchical interactions in the nonparametric setting with

n \sim \log p

samples.Comment: 33 pages main text

arXiv.org e-Print Archive

Kernel Learning in Ridge Regression "Automatically" Yields Exact Low Rank Solution

Author: Chen Yunlu
Li Yang
Liu Keli
Ruan Feng
Publication venue
Publication date: 27/11/2023
Field of study

We consider kernels of the form

(x,x') \mapsto \phi(\|x-x'\|^2_\Sigma)

parametrized by

\Sigma

. For such kernels, we study a variant of the kernel ridge regression problem which simultaneously optimizes the prediction function and the parameter

\Sigma

of the reproducing kernel Hilbert space. The eigenspace of the

\Sigma

learned from this kernel ridge regression problem can inform us which directions in covariate space are important for prediction. Assuming that the covariates have nonzero explanatory power for the response only through a low dimensional subspace (central mean subspace), we find that the global minimizer of the finite sample kernel learning objective is also low rank with high probability. More precisely, the rank of the minimizing

\Sigma

is with high probability bounded by the dimension of the central mean subspace. This phenomenon is interesting because the low rankness property is achieved without using any explicit regularization of

\Sigma

, e.g., nuclear norm penalization. Our theory makes correspondence between the observed phenomenon and the notion of low rank set identifiability from the optimization literature. The low rankness property of the finite sample solutions exists because the population kernel learning objective grows "sharply" when moving away from its minimizers in any direction perpendicular to the central mean subspace.Comment: Add code links and correct a figur

arXiv.org e-Print Archive

Geometric Framework for Evaluating Rare Variant Tests of Association

Author: Fast Shannon
Liu Keli
Tintle Nathan L.
Zawistowski Matthew
Publication venue: Digital Collections @ Dordt
Publication date: 01/05/2013
Field of study

The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers

Dordt College

Powerful Method for Including Genotype Uncertainty in Tests of Hardy-Weinberg Equilibrium

Author: Beck Andrew
Liu Keli
Luedtke Alexander
Tintle Nathan L.
Publication venue: Digital Collections @ Dordt
Publication date: 01/01/2016
Field of study

The use of posterior probabilities to summarize genotype uncertainty is pervasive across genotype, sequencing and imputation platforms. Prior work in many contexts has shown the utility of incorporating genotype uncertainty (posterior probabilities) in downstream statistical tests. Typical approaches to incorporating genotype uncertainty when testing Hardy-Weinberg equilibrium tend to lack calibration in the type I error rate, especially as genotype uncertainty increases. We propose a new approach in the spirit of genomic control that properly calibrates the type I error rate, while yielding improved power to detect deviations from Hardy-Weinberg Equilibrium. We demonstrate the improved performance of our method on both simulated and real genotypes

Crossref

Dordt College

PubMed Central

Recommended from our members

Comment: A Fruitful Resolution to Simpson’s Paradox via Multiresolution Inference

Author: Liu Keli
Meng Xiao-Li
Publication venue: 'Informa UK Limited'
Publication date: 07/08/2015
Field of study

Simpson’s Paradox is really a Simple Paradox if one at all. Peeling away the paradox is as easy (or hard) as avoiding a comparison of apples and oranges, a concept requiring no mention of causality. We show how the commonly adopted notation has committed the gross-ery mistake of tagging unlike fruit with alike labels. Hence, the “fruitful” question to ask is not “Do we condition on the third variable?” but rather “Are two fruits, which appear similar, actually similar at their core?.” We introduce the concept of intrinsic similarity to escape this bind. The notion of “core” depends on how deep one looks—the multi resolution inference framework provides a natural way to define intrinsic similarity at the resolution appropriate for the treatment. To harvest the fruits of this insight, we will need to estimate intrinsic similarity, which often results in an indirect conditioning on the “third variable.” A ripening estimation theory shows that the standard treatment comparisons, unconditional or conditional on the third variable, are low hanging fruit but often rotten. We pose assumptions to pluck away higher-resolution (more conditional) comparisons—the multiresolution framework allows us to rigorously assess the price of these assumptions against the resulting yield. One such assessment gives us Simpson’s Warning: less conditioning is most likely to lead to serious bias when Simpson’s Paradox appears.Statistic

Harvard University - DASH

Rorc restrains the potency of ST2+ regulatory T cells in ameliorating intestinal graft-versus-host disease

Author: Blazar Bruce R.
Griesenauer Brad
Hippen Keli L.
Liu Hong
Loschi Michael
Paczesny Sophie
Ramadan Abdulraouf
Reichenbach Dawn K.
Yang Jinfeng
Zhang Jilu
Publication venue: 'American Society for Clinical Investigation'
Publication date: 07/03/2019
Field of study

Soluble stimulation-2 (ST2) is increased during graft-versus-host disease (GVHD), while Tregs that express ST2 prevent GVHD through unknown mechanisms. Transplantation of Foxp3- T cells and Tregs that were collected and sorted from different Foxp3 reporter mice indicated that in mice that developed GVHD, ST2+ Tregs were thymus derived and predominantly localized to the intestine. ST2-/- Treg transplantation was associated with reduced total intestinal Treg frequency and activation. ST2-/- versus WT intestinal Treg transcriptomes showed decreased Treg functional markers and, reciprocally, increased Rorc expression. Rorc-/- T cells transplantation enhanced the frequency and function of intestinal ST2+ Tregs and reduced GVHD through decreased gut-infiltrating soluble ST2-producing type 1 and increased IL-4/IL-10-producing type 2 T cells. Cotransfer of ST2+ Tregs sorted from Rorc-/- mice with WT CD25-depleted T cells decreased GVHD severity and mortality, increased intestinal ST2+KLRG1+ Tregs, and decreased type 1 T cells after transplantation, indicating an intrinsic mechanism. Ex vivo IL-33-stimulated Tregs (TregIL-33) expressed higher amphiregulin and displayed better immunosuppression, and adoptive transfer prevented GVHD better than control Tregs or TregIL-33 cultured with IL-23/IL-17. Amphiregulin blockade by neutralizing antibody in vivo abolished the protective effect of TregIL-33. Our data show that inverse expression of ST2 and RORγt in intestinal Tregs determines GVHD and that TregIL-33 has potential as a cellular therapy avenue for preventing GVHD

IUPUIScholarWorks

Hierarchical accompanying and inhibiting patterns on the spatial arrangement of taxis' local hotspots

Author: Chen Xiao-Jian
Huanga Zhou
Liu Yu
Wang Keli
Xiao Changjiang
Zhang Weiyu
Publication venue
Publication date: 18/10/2023
Field of study

Due to the large volume of recording, the complete spontaneity, and the flexible pick-up and drop-off locations, taxi data portrays a realistic and detailed picture of urban space use to a certain extent. The spatial arrangement of pick-up and drop-off hotspots reflects the organizational space, which has received attention in urban structure studies. Previous studies mainly explore the hotspots at a large scale by visual analysis or some simple indexes, where the hotspots usually cover the entire central business district, train stations, or dense residential areas, reaching a radius of hundreds or even thousands of meters. However, the spatial arrangement patterns of small-scale hotspots, reflecting the specific popular pick-up and drop-off locations, have not received much attention. Using two taxi trajectory datasets in Wuhan and Beijing, China, this study quantitatively explores the spatial arrangement of fine-grained pick-up and drop-off local hotspots with different levels of popularity, where the sizes are adaptively set as 90m*90m in Wuhan and 105m*105m in Beijing according to the local hotspot identification method. Results show that popular hotspots tend to be surrounded by less popular hotspots, but the existence of less popular hotspots is inhibited in regions with a large number of popular hotspots. We use the terms hierarchical accompany and inhibiting patterns for these two spatial configurations. Finally, to uncover the underlying mechanism, a KNN-based model is proposed to reproduce the spatial distribution of other less popular hotspots according to the most popular ones. These findings help decision-makers construct reasonable urban minimum units for precise traffic and disease control, as well as plan a more humane spatial arrangement of points of interest

arXiv.org e-Print Archive