Search CORE

207 research outputs found

Deciding the dimension of effective dimension reduction space for functional and high-dimensional data

Author: Hsing Tailen
Li Yehua
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 11/11/2010
Field of study

In this paper, we consider regression models with a Hilbert-space-valued predictor and a scalar response, where the response depends on the predictor only through a finite number of projections. The linear subspace spanned by these projections is called the effective dimension reduction (EDR) space. To determine the dimensionality of the EDR space, we focus on the leading principal component scores of the predictor, and propose two sequential

\chi^2

testing procedures under the assumption that the predictor has an elliptically contoured distribution. We further extend these procedures and introduce a test that simultaneously takes into account a large number of principal component scores. The proposed procedures are supported by theory, validated by simulation studies, and illustrated by a real-data example. Our methods and theory are applicable to functional data and high-dimensional multivariate data.Comment: Published in at http://dx.doi.org/10.1214/10-AOS816 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Joint modeling of longitudinal drug using pattern and time to first relapse in cocaine dependence treatment data

Author: Guan Yongtao
Li Yehua
Ye Jun
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 21/08/2015
Field of study

An important endpoint variable in a cocaine rehabilitation study is the time to first relapse of a patient after the treatment. We propose a joint modeling approach based on functional data analysis to study the relationship between the baseline longitudinal cocaine-use pattern and the interval censored time to first relapse. For the baseline cocaine-use pattern, we consider both self-reported cocaine-use amount trajectories and dichotomized use trajectories. Variations within the generalized longitudinal trajectories are modeled through a latent Gaussian process, which is characterized by a few leading functional principal components. The association between the baseline longitudinal trajectories and the time to first relapse is built upon the latent principal component scores. The mean and the eigenfunctions of the latent Gaussian process as well as the hazard function of time to first relapse are modeled nonparametrically using penalized splines, and the parameters in the joint model are estimated by a Monte Carlo EM algorithm based on Metropolis-Hastings steps. An Akaike information criterion (AIC) based on effective degrees of freedom is proposed to choose the tuning parameters, and a modified empirical information is proposed to estimate the variance-covariance matrix of the estimators.Comment: Published at http://dx.doi.org/10.1214/15-AOAS852 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

University of Miami: Scholarship Miami

Topics in functional data analysis with biological applications

Author: Li Yehua
Publication venue
Publication date: 02/06/2009
Field of study

Functional data analysis (FDA) is an active field of statistics, in which the primary subjects in the study are curves. My dissertation consists of two innovative applications of functional data analysis in biology. The data that motivated the research broadened the scope of FDA and demanded new methodology. I develop new nonparametric methods to make various estimations, and I focus on developing large sample theories for the proposed estimators. The first project is motivated from a colon carcinogenesis study, the goal of which is to study the function of a protein (p27) in colon cancer development. In this study, a number of colonic crypts (units) were sampled from each rat (subject) at random locations along the colon, and then repeated measurements on the protein expression level were made on each cell (subunit) within the selected crypts. In this problem, measurements within each crypt can be viewed as a function, since the measurements can be indexed by the cell locations. The functions from the same subject are spatially correlated along the colon, and my goal is to estimate this correlation function using nonparametric methods. We use this data set as an motivation and propose a kernel estimator of the correlation function in a more general framework. We develop a pointwise asymptotic normal distribution for the proposed estimator when the number of subjects is fixed and the number of units within each subject goes to infinity. Based on the asymptotic theory, we propose a weighted block bootstrapping method for making inferences about the correlation function, where the weights account for the inhomogeneity of the distribution of the unit locations. Simulation studies are also provided to illustrate the numerical performance of the proposed method. My second project is on a lipoprotein profile data, where the goal is to use lipoprotein profile curves to predict the cholesterol level in human blood. Again, motivated by the data, we consider a more general problem: the functional linear models (Ramsay and Silverman, 1997) with functional predictor and scalar response. There is literature developing different methods for this model; however, there is little theory to support the methods. Therefore, we focus more on the theoretical properties of this model. There are other contemporary theoretical work on methods based on Principal Component Regression. Our work is different in the sense that we base our method on roughness penalty approach and consider a more realistic scenario that the functional predictor is observed only on discrete points. To reduce the difficulty of the theoretical derivations, we restrict the functions with a periodic boundary condition and develop an asymptotic convergence rate for this problem in Chapter III. A more general result based on splines is a future research topic that I give some discussion in Chapter IV

Texas A&M Repository

Restructuring industrial districts, scaling up regional development: a study of the Wenzhou Model, China

Author: Li Wangming
Wei Yehua
Publication venue: University of Utah
Publication date: 24/09/2007
Field of study

Working PaperThe Wenzhou Municipality in Zhejiang Province is spearheading China's marketization and development of private enterprises. Its successful development trajectory, centered on family-owned small businesses embedded in thick local institutions, resembles Marshallian industrial districts (MIDs). However, with China's changing institutional environment and intensifying competition, Wenzhou has been facing challenges. Since the late 1980s, Wenzhou has gone through two major rounds of restructuring (from family enterprises to shareholding cooperatives to shareholding enterprises), that have included four major types of strategic response: institutional change, technological upgrading, industrial diversification, and spatial restructuring. Firms in Wenzhou have gone through localization and delocalization, and locational choices reflect the dual destinations of globalizing cities and interior cities. The formation of new firms and clusters has been accompanied by mergers, acquisitions, and the emergence of multiregional enterprises (MREs), some of which have relocated their headquarters and specialized functions to metropolitan areas, especially Shanghai and Hangzhou. More recently, Wenzhou's growth has slowed, leading some to question the sustainability of the Wenzhou model. We argue that Wenzhou's development is in danger of regional lock-ins--relational, intergenerational, and structural. Wenzhou's experience challenges the orthodox concept of MIDs and calls for "scaling up" regional development

The University of Utah: J. Willard Marriott Digital Library

Selenocysteine insertion directed by the 3′-UTR SECIS element in Escherichia coli

Author: Gladyshev Vadim N.
Li Yehua
Su Dan
Publication venue: Oxford University Press
Publication date: 29/04/2005
Field of study

Co-translational insertion of selenocysteine (Sec) into proteins in response to UGA codons is directed by selenocysteine insertion sequence (SECIS) elements. In known bacterial selenoprotein genes, SECIS elements are located in the coding regions immediately downstream of UGA codons. Here, we report that a distant SECIS element can also function in Sec insertion in bacteria provided that it is spatially close to the UGA codon. We expressed a mammalian phospholipid hydroperoxide glutathione peroxidase in Escherichia coli from a construct in which a natural E.coli SECIS element was located in the 3′-untranslated region (3′-UTR) and adjacent to a sequence complementary to the region downstream of the Sec UGA codon. Although the major readthrough event at the UGA codon was insertion of tryptophan, Sec was also incorporated and its insertion was dependent on the functional SECIS element in the UTR, base-pairing potential of the SECIS flanking region and the Sec UGA codon. These data provide important implications into evolution of SECIS elements and development of a system for heterologous expression of selenoproteins and show that in addition to the primary sequence arrangement between UGA codons and SECIS elements, their proximity within the tertiary structure can support Sec insertion in bacteria

Crossref

PubMed Central

Nonparametric estimation of correlation functions in longitudinal and spatial data, with application to colon carcinogenesis experiments

Author: Carroll Raymond J.
Hong Meeyoung
Li Yehua
Lupton Joanne R.
Turner Nancy D.
Wang Naisyin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

In longitudinal and spatial studies, observations often demonstrate strong correlations that are stationary in time or distance lags, and the times or locations of these data being sampled may not be homogeneous. We propose a nonparametric estimator of the correlation function in such data, using kernel methods. We develop a pointwise asymptotic normal distribution for the proposed estimator, when the number of subjects is fixed and the number of vectors or functions within each subject goes to infinity. Based on the asymptotic theory, we propose a weighted block bootstrapping method for making inferences about the correlation function, where the weights account for the inhomogeneity of the distribution of the times or locations. The method is applied to a data set from a colon carcinogenesis study, in which colonic crypts were sampled from a piece of colon segment from each of the 12 rats in the experiment and the expression level of p27, an important cell cycle protein, was then measured for each cell within the sampled crypts. A simulation study is also provided to illustrate the numerical performance of the proposed method.Comment: Published in at http://dx.doi.org/10.1214/009053607000000082 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Texas A&M Repository

Bias-correction and Test for Mark-point Dependence with Replicated Marked Point Processes

Author: Guan Yongtao
Li Yehua
Xu Ganggang
Zhang Jingfei
Publication venue
Publication date: 25/07/2022
Field of study

Mark-point dependence plays a critical role in research problems that can be fitted into the general framework of marked point processes. In this work, we focus on adjusting for mark-point dependence when estimating the mean and covariance functions of the mark process, given independent replicates of the marked point process. We assume that the mark process is a Gaussian process and the point process is a log-Gaussian Cox process, where the mark-point dependence is generated through the dependence between two latent Gaussian processes. Under this framework, naive local linear estimators ignoring the mark-point dependence can be severely biased. We show that this bias can be corrected using a local linear estimator of the cross-covariance function and establish uniform convergence rates of the bias-corrected estimators. Furthermore, we propose a test statistic based on local linear estimators for mark-point independence, which is shown to converge to an asymptotic normal distribution in a parametric

\sqrt{n}

-convergence rate. Model diagnostics tools are developed for key model assumptions and a robust functional permutation test is proposed for a more general class of mark-point processes. The effectiveness of the proposed methods is demonstrated using extensive simulations and applications to two real data examples

arXiv.org e-Print Archive

University of Miami: Scholarship Miami

Unified empirical likelihood ratio tests for functional concurrent linear models and the phase transition from sparse to dense functional data

Author: Cui Yuehua
Li Yehua
Wang Honglang
Zhong Ping-Shou
Publication venue: 'Wiley'
Publication date: 01/03/2018
Field of study

We consider the problem of testing functional constraints in a class of functional concurrent linear models where both the predictors and the response are functional data measured at discrete time points. We propose test procedures based on the empirical likelihood with bias‐corrected estimating equations to conduct both pointwise and simultaneous inferences. The asymptotic distributions of the test statistics are derived under the null and local alternative hypotheses, where sparse and dense functional data are considered in a unified framework. We find a phase transition in the asymptotic null distributions and the orders of detectable alternatives from sparse to dense functional data. Specifically, the tests proposed can detect alternatives of √n‐order when the number of repeated measurements per curve is of an order larger than urn:x-wiley:13697412:media:rssb12246:rssb12246-math-0001 with n being the number of curves. The transition points urn:x-wiley:13697412:media:rssb12246:rssb12246-math-0002 for pointwise and simultaneous tests are different and both are smaller than the transition point in the estimation problem. Simulation studies and real data analyses are conducted to demonstrate the methods proposed

IUPUIScholarWorks