47 research outputs found
Local partial-likelihood estimation for lifetime data
This paper considers a proportional hazards model, which allows one to
examine the extent to which covariates interact nonlinearly with an exposure
variable, for analysis of lifetime data. A local partial-likelihood technique
is proposed to estimate nonlinear interactions. Asymptotic normality of the
proposed estimator is established. The baseline hazard function, the bias and
the variance of the local likelihood estimator are consistently estimated. In
addition, a one-step local partial-likelihood estimator is presented to
facilitate the computation of the proposed procedure and is demonstrated to be
as efficient as the fully iterated local partial-likelihood estimator.
Furthermore, a penalized local likelihood estimator is proposed to select
important risk variables in the model. Numerical examples are used to
illustrate the effectiveness of the proposed procedures.Comment: Published at http://dx.doi.org/10.1214/009053605000000796 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Factor-guided functional PCA for high-dimensional functional data
The literature on high-dimensional functional data focuses on either the
dependence over time or the correlation among functional variables. In this
paper, we propose a factor-guided functional principal component analysis
(FaFPCA) method to consider both temporal dependence and correlation of
variables so that the extracted features are as sufficient as possible. In
particular, we use a factor process to consider the correlation among
high-dimensional functional variables and then apply functional principal
component analysis (FPCA) to the factor processes to address the dependence
over time. Furthermore, to solve the computational problem arising from
triple-infinite dimensions, we creatively build some moment equations to
estimate loading, scores and eigenfunctions in closed form without rotation.
Theoretically, we establish the asymptotical properties of the proposed
estimator. Extensive simulation studies demonstrate that our proposed method
outperforms other competitors in terms of accuracy and computational cost. The
proposed method is applied to analyze the Alzheimer's Disease Neuroimaging
Initiative (ADNI) dataset, resulting in higher prediction accuracy and 41
important ROIs that are associated with Alzheimer's disease, 23 of which have
been confirmed by the literature.Comment: 34 pages, 5 figures, 3 table
A Semi-Parametric Two-Part Mixed-Effects Heteroscedastic Transformation Model for Correlated Right-Skewed Semi-Continuous Data
In longitudinal or hierarchical structure studies, we often encounter a semi-continuous variable that has a certain proportion of a single value and a continuous and skewed distribution among the rest of values. In the paper, we propose a new semi-parametric two-part mixed-effects transformation model to fit correlated skewed semi-continuous data. In our model, we allow the transformation to be non-parametric. Fitting the proposed model faces computational challenges due to intractable numerical integrations. We derive the estimates for the parameter and the transformation function based on an approximate likelihood, which has high order accuracy but less computational burden. We also propose an estimator for the expected value of the semi-continuous outcome on the original-scale. Finally, we apply the proposed methods to a clinical study on effectiveness of a collaborative care treatment on late life depression on health care costs
Semi-Parametric Maximum Likelihood Estimates for ROC Curves of Continuous-Scale Tests
In this paper, we propose a new semi-parametric maximum likelihood (ML) estimate of an ROC curve that satisfies the property of invariance of the ROC curve and is easy to compute. We show that our new estimator is [Formula: see text]-consistent and has an asymptotically normal distribution. Our extensive simulation studies show the proposed method is efficient, robust, and simple to compute. Finally, we illustrate the application of the proposed estimator in a real data set
Deep regression learning with optimal loss function
In this paper, we develop a novel efficient and robust nonparametric
regression estimator under a framework of feedforward neural network. There are
several interesting characteristics for the proposed estimator. First, the loss
function is built upon an estimated maximum likelihood function, who integrates
the information from observed data, as well as the information from data
structure. Consequently, the resulting estimator has desirable optimal
properties, such as efficiency. Second, different from the traditional maximum
likelihood estimation (MLE), the proposed method avoid the specification of the
distribution, hence is flexible to any kind of distribution, such as heavy
tails, multimodal or heterogeneous distribution. Third, the proposed loss
function relies on probabilities rather than direct observations as in least
squares, contributing the robustness in the proposed estimator. Finally, the
proposed loss function involves nonparametric regression function only. This
enables a direct application of existing packages, simplifying the computation
and programming. We establish the large sample property of the proposed
estimator in terms of its excess risk and minimax near-optimal rate. The
theoretical results demonstrate that the proposed estimator is equivalent to
the true MLE in which the density function is known. Our simulation studies
show that the proposed estimator outperforms the existing methods in terms of
prediction accuracy, efficiency and robustness. Particularly, it is comparable
to the true MLE, and even gets better as the sample size increases. This
implies that the adaptive and data-driven loss function from the estimated
density may offer an additional avenue for capturing valuable information. We
further apply the proposed method to four real data examples, resulting in
significantly reduced out-of-sample prediction errors compared to existing
methods
Evaluation of transplant benefits with the U.S. Scientific Registry of Transplant Recipients by semiparametric regression of mean residual life
Kidney transplantation is the most effective renal replacement therapy for
end stage renal disease patients. With the severe shortage of kidney supplies
and for the clinical effectiveness of transplantation, patient's life
expectancy post transplantation is used to prioritize patients for
transplantation; however, severe comorbidity conditions and old age are the
most dominant factors that negatively impact post-transplantation life
expectancy, effectively precluding sick or old patients from receiving
transplants. It would be crucial to design objective measures to quantify the
transplantation benefit by comparing the mean residual life with and without a
transplant, after adjusting for comorbidity and demographic conditions. To
address this urgent need, we propose a new class of semiparametric
covariate-dependent mean residual life models. Our method estimates covariate
effects semiparametrically efficiently and the mean residual life function
nonparametrically, enabling us to predict the residual life increment potential
for any given patient. Our method potentially leads to a more fair system that
prioritizes patients who would have the largest residual life gains. Our
analysis of the kidney transplant data from the U.S. Scientific Registry of
Transplant Recipients also suggests that a single index of covariates summarize
well the impacts of multiple covariates, which may facilitate interpretations
of each covariate's effect. Our subgroup analysis further disclosed
inequalities in survival gains across groups defined by race, gender and
insurance type (reflecting socioeconomic status).Comment: 68 pages, 13 figures. arXiv admin note: text overlap with
arXiv:2011.0406
Inferences in Censored Cost Regression Models with Empirical Likelihood
In many studies of health economics, we are interested in the expected total cost over a certain period for a patient with given characteristics. Problems can arise if cost estimation models do not account for distributional aspects of costs. Two such problems are 1) the skewed nature of the data and 2) censored observations. In this paper we propose an empirical likelihood (EL) method for constructing a confidence region for the vector of regression parameters and a confidence interval for the expected total cost of a patient with the given covariates. We show that this new method has good theoretical properties and compare its finite-sample properties with the existing method. Our simulation results demonstrate that the new EL-based method performs equally well with the existing method when cost data are not so skewed, and outperforms the existing method when cost data are highly skewed. Finally, we illustrate the application of our method in a real data set
Two-directional simultaneous inference for high-dimensional models
This paper proposes a general two directional simultaneous inference (TOSI)
framework for high-dimensional models with a manifest variable or latent
variable structure, for example, high-dimensional mean models, high-dimensional
sparse regression models, and high-dimensional latent factors models. TOSI
performs simultaneous inference on a set of parameters from two directions, one
to test whether the assumed zero parameters indeed are zeros and one to test
whether exist zeros in the parameter set of nonzeros. As a result, we can
exactly identify whether the parameters are zeros, thereby keeping the data
structure fully and parsimoniously expressed. We theoretically prove that the
proposed TOSI method asymptotically controls the Type I error at the
prespecified significance level and that the testing power converges to one.
Simulations are conducted to examine the performance of the proposed method in
finite sample situations and two real datasets are analyzed. The results show
that the TOSI method is more predictive and has more interpretable estimators
than existing methods
Selection of Latent Variables for Multiple Mixed-Outcome Models
Latent variable models have been widely used for modeling the dependence structure of multiple outcomes data. As the formulation of a latent variable model is often unknown a priori, misspecification could distort the dependence structure and lead to unreliable model inference. More- over, the multiple outcomes are often of varying types (e.g., continuous and ordinal), which presents analytical challenges. In this article, we present a class of general latent variable models that can accommodate mixed types of outcomes, and further propose a novel selection approach that simultaneously selects latent variables and estimates model parameters. We show that the proposed estimators are consistent, asymptotically normal, and have the Oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of a dataset collected in the World Values Survey (WVS), a global research project that explores peoples\u27 values and beliefs and what social and personal characteristics might influence them