207 research outputs found
Deciding the dimension of effective dimension reduction space for functional and high-dimensional data
In this paper, we consider regression models with a Hilbert-space-valued
predictor and a scalar response, where the response depends on the predictor
only through a finite number of projections. The linear subspace spanned by
these projections is called the effective dimension reduction (EDR) space. To
determine the dimensionality of the EDR space, we focus on the leading
principal component scores of the predictor, and propose two sequential
testing procedures under the assumption that the predictor has an
elliptically contoured distribution. We further extend these procedures and
introduce a test that simultaneously takes into account a large number of
principal component scores. The proposed procedures are supported by theory,
validated by simulation studies, and illustrated by a real-data example. Our
methods and theory are applicable to functional data and high-dimensional
multivariate data.Comment: Published in at http://dx.doi.org/10.1214/10-AOS816 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Joint modeling of longitudinal drug using pattern and time to first relapse in cocaine dependence treatment data
An important endpoint variable in a cocaine rehabilitation study is the time
to first relapse of a patient after the treatment. We propose a joint modeling
approach based on functional data analysis to study the relationship between
the baseline longitudinal cocaine-use pattern and the interval censored time to
first relapse. For the baseline cocaine-use pattern, we consider both
self-reported cocaine-use amount trajectories and dichotomized use
trajectories. Variations within the generalized longitudinal trajectories are
modeled through a latent Gaussian process, which is characterized by a few
leading functional principal components. The association between the baseline
longitudinal trajectories and the time to first relapse is built upon the
latent principal component scores. The mean and the eigenfunctions of the
latent Gaussian process as well as the hazard function of time to first relapse
are modeled nonparametrically using penalized splines, and the parameters in
the joint model are estimated by a Monte Carlo EM algorithm based on
Metropolis-Hastings steps. An Akaike information criterion (AIC) based on
effective degrees of freedom is proposed to choose the tuning parameters, and a
modified empirical information is proposed to estimate the variance-covariance
matrix of the estimators.Comment: Published at http://dx.doi.org/10.1214/15-AOAS852 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Topics in functional data analysis with biological applications
Functional data analysis (FDA) is an active field of statistics, in which the primary subjects
in the study are curves. My dissertation consists of two innovative applications of
functional data analysis in biology. The data that motivated the research broadened the
scope of FDA and demanded new methodology. I develop new nonparametric methods to
make various estimations, and I focus on developing large sample theories for the proposed
estimators.
The first project is motivated from a colon carcinogenesis study, the goal of which is to
study the function of a protein (p27) in colon cancer development. In this study, a number
of colonic crypts (units) were sampled from each rat (subject) at random locations along
the colon, and then repeated measurements on the protein expression level were made on
each cell (subunit) within the selected crypts. In this problem, measurements within each
crypt can be viewed as a function, since the measurements can be indexed by the cell
locations. The functions from the same subject are spatially correlated along the colon,
and my goal is to estimate this correlation function using nonparametric methods. We use
this data set as an motivation and propose a kernel estimator of the correlation function
in a more general framework. We develop a pointwise asymptotic normal distribution
for the proposed estimator when the number of subjects is fixed and the number of units within each subject goes to infinity. Based on the asymptotic theory, we propose a weighted
block bootstrapping method for making inferences about the correlation function, where the
weights account for the inhomogeneity of the distribution of the unit locations. Simulation
studies are also provided to illustrate the numerical performance of the proposed method.
My second project is on a lipoprotein profile data, where the goal is to use lipoprotein
profile curves to predict the cholesterol level in human blood. Again, motivated by the data,
we consider a more general problem: the functional linear models (Ramsay and Silverman,
1997) with functional predictor and scalar response. There is literature developing different
methods for this model; however, there is little theory to support the methods. Therefore,
we focus more on the theoretical properties of this model. There are other contemporary
theoretical work on methods based on Principal Component Regression. Our work is different
in the sense that we base our method on roughness penalty approach and consider a
more realistic scenario that the functional predictor is observed only on discrete points. To
reduce the difficulty of the theoretical derivations, we restrict the functions with a periodic
boundary condition and develop an asymptotic convergence rate for this problem in Chapter
III. A more general result based on splines is a future research topic that I give some
discussion in Chapter IV
Restructuring industrial districts, scaling up regional development: a study of the Wenzhou Model, China
Working PaperThe Wenzhou Municipality in Zhejiang Province is spearheading China's marketization and development of private enterprises. Its successful development trajectory, centered on family-owned small businesses embedded in thick local institutions, resembles Marshallian industrial districts (MIDs). However, with China's changing institutional environment and intensifying competition, Wenzhou has been facing challenges. Since the late 1980s, Wenzhou has gone through two major rounds of restructuring (from family enterprises to shareholding cooperatives to shareholding enterprises), that have included four major types of strategic response: institutional change, technological upgrading, industrial diversification, and spatial restructuring. Firms in Wenzhou have gone through localization and delocalization, and locational choices reflect the dual destinations of globalizing cities and interior cities. The formation of new firms and clusters has been accompanied by mergers, acquisitions, and the emergence of multiregional enterprises (MREs), some of which have relocated their headquarters and specialized functions to metropolitan areas, especially Shanghai and Hangzhou. More recently, Wenzhou's growth has slowed, leading some to question the sustainability of the Wenzhou model. We argue that Wenzhou's development is in danger of regional lock-ins--relational, intergenerational, and structural. Wenzhou's experience challenges the orthodox concept of MIDs and calls for "scaling up" regional development
Selenocysteine insertion directed by the 3′-UTR SECIS element in Escherichia coli
Co-translational insertion of selenocysteine (Sec) into proteins in response to UGA codons is directed by selenocysteine insertion sequence (SECIS) elements. In known bacterial selenoprotein genes, SECIS elements are located in the coding regions immediately downstream of UGA codons. Here, we report that a distant SECIS element can also function in Sec insertion in bacteria provided that it is spatially close to the UGA codon. We expressed a mammalian phospholipid hydroperoxide glutathione peroxidase in Escherichia coli from a construct in which a natural E.coli SECIS element was located in the 3′-untranslated region (3′-UTR) and adjacent to a sequence complementary to the region downstream of the Sec UGA codon. Although the major readthrough event at the UGA codon was insertion of tryptophan, Sec was also incorporated and its insertion was dependent on the functional SECIS element in the UTR, base-pairing potential of the SECIS flanking region and the Sec UGA codon. These data provide important implications into evolution of SECIS elements and development of a system for heterologous expression of selenoproteins and show that in addition to the primary sequence arrangement between UGA codons and SECIS elements, their proximity within the tertiary structure can support Sec insertion in bacteria
Nonparametric estimation of correlation functions in longitudinal and spatial data, with application to colon carcinogenesis experiments
In longitudinal and spatial studies, observations often demonstrate strong
correlations that are stationary in time or distance lags, and the times or
locations of these data being sampled may not be homogeneous. We propose a
nonparametric estimator of the correlation function in such data, using kernel
methods. We develop a pointwise asymptotic normal distribution for the proposed
estimator, when the number of subjects is fixed and the number of vectors or
functions within each subject goes to infinity. Based on the asymptotic theory,
we propose a weighted block bootstrapping method for making inferences about
the correlation function, where the weights account for the inhomogeneity of
the distribution of the times or locations. The method is applied to a data set
from a colon carcinogenesis study, in which colonic crypts were sampled from a
piece of colon segment from each of the 12 rats in the experiment and the
expression level of p27, an important cell cycle protein, was then measured for
each cell within the sampled crypts. A simulation study is also provided to
illustrate the numerical performance of the proposed method.Comment: Published in at http://dx.doi.org/10.1214/009053607000000082 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bias-correction and Test for Mark-point Dependence with Replicated Marked Point Processes
Mark-point dependence plays a critical role in research problems that can be
fitted into the general framework of marked point processes. In this work, we
focus on adjusting for mark-point dependence when estimating the mean and
covariance functions of the mark process, given independent replicates of the
marked point process. We assume that the mark process is a Gaussian process and
the point process is a log-Gaussian Cox process, where the mark-point
dependence is generated through the dependence between two latent Gaussian
processes. Under this framework, naive local linear estimators ignoring the
mark-point dependence can be severely biased. We show that this bias can be
corrected using a local linear estimator of the cross-covariance function and
establish uniform convergence rates of the bias-corrected estimators.
Furthermore, we propose a test statistic based on local linear estimators for
mark-point independence, which is shown to converge to an asymptotic normal
distribution in a parametric -convergence rate. Model diagnostics
tools are developed for key model assumptions and a robust functional
permutation test is proposed for a more general class of mark-point processes.
The effectiveness of the proposed methods is demonstrated using extensive
simulations and applications to two real data examples
Unified empirical likelihood ratio tests for functional concurrent linear models and the phase transition from sparse to dense functional data
We consider the problem of testing functional constraints in a class of functional concurrent linear models where both the predictors and the response are functional data measured at discrete time points. We propose test procedures based on the empirical likelihood with bias‐corrected estimating equations to conduct both pointwise and simultaneous inferences. The asymptotic distributions of the test statistics are derived under the null and local alternative hypotheses, where sparse and dense functional data are considered in a unified framework. We find a phase transition in the asymptotic null distributions and the orders of detectable alternatives from sparse to dense functional data. Specifically, the tests proposed can detect alternatives of √n‐order when the number of repeated measurements per curve is of an order larger than urn:x-wiley:13697412:media:rssb12246:rssb12246-math-0001 with n being the number of curves. The transition points urn:x-wiley:13697412:media:rssb12246:rssb12246-math-0002 for pointwise and simultaneous tests are different and both are smaller than the transition point in the estimation problem. Simulation studies and real data analyses are conducted to demonstrate the methods proposed
- …