47 research outputs found

    Local partial-likelihood estimation for lifetime data

    Full text link
    This paper considers a proportional hazards model, which allows one to examine the extent to which covariates interact nonlinearly with an exposure variable, for analysis of lifetime data. A local partial-likelihood technique is proposed to estimate nonlinear interactions. Asymptotic normality of the proposed estimator is established. The baseline hazard function, the bias and the variance of the local likelihood estimator are consistently estimated. In addition, a one-step local partial-likelihood estimator is presented to facilitate the computation of the proposed procedure and is demonstrated to be as efficient as the fully iterated local partial-likelihood estimator. Furthermore, a penalized local likelihood estimator is proposed to select important risk variables in the model. Numerical examples are used to illustrate the effectiveness of the proposed procedures.Comment: Published at http://dx.doi.org/10.1214/009053605000000796 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Factor-guided functional PCA for high-dimensional functional data

    Full text link
    The literature on high-dimensional functional data focuses on either the dependence over time or the correlation among functional variables. In this paper, we propose a factor-guided functional principal component analysis (FaFPCA) method to consider both temporal dependence and correlation of variables so that the extracted features are as sufficient as possible. In particular, we use a factor process to consider the correlation among high-dimensional functional variables and then apply functional principal component analysis (FPCA) to the factor processes to address the dependence over time. Furthermore, to solve the computational problem arising from triple-infinite dimensions, we creatively build some moment equations to estimate loading, scores and eigenfunctions in closed form without rotation. Theoretically, we establish the asymptotical properties of the proposed estimator. Extensive simulation studies demonstrate that our proposed method outperforms other competitors in terms of accuracy and computational cost. The proposed method is applied to analyze the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, resulting in higher prediction accuracy and 41 important ROIs that are associated with Alzheimer's disease, 23 of which have been confirmed by the literature.Comment: 34 pages, 5 figures, 3 table

    A Semi-Parametric Two-Part Mixed-Effects Heteroscedastic Transformation Model for Correlated Right-Skewed Semi-Continuous Data

    Get PDF
    In longitudinal or hierarchical structure studies, we often encounter a semi-continuous variable that has a certain proportion of a single value and a continuous and skewed distribution among the rest of values. In the paper, we propose a new semi-parametric two-part mixed-effects transformation model to fit correlated skewed semi-continuous data. In our model, we allow the transformation to be non-parametric. Fitting the proposed model faces computational challenges due to intractable numerical integrations. We derive the estimates for the parameter and the transformation function based on an approximate likelihood, which has high order accuracy but less computational burden. We also propose an estimator for the expected value of the semi-continuous outcome on the original-scale. Finally, we apply the proposed methods to a clinical study on effectiveness of a collaborative care treatment on late life depression on health care costs

    Semi-Parametric Maximum Likelihood Estimates for ROC Curves of Continuous-Scale Tests

    Get PDF
    In this paper, we propose a new semi-parametric maximum likelihood (ML) estimate of an ROC curve that satisfies the property of invariance of the ROC curve and is easy to compute. We show that our new estimator is [Formula: see text]-consistent and has an asymptotically normal distribution. Our extensive simulation studies show the proposed method is efficient, robust, and simple to compute. Finally, we illustrate the application of the proposed estimator in a real data set

    Deep regression learning with optimal loss function

    Full text link
    In this paper, we develop a novel efficient and robust nonparametric regression estimator under a framework of feedforward neural network. There are several interesting characteristics for the proposed estimator. First, the loss function is built upon an estimated maximum likelihood function, who integrates the information from observed data, as well as the information from data structure. Consequently, the resulting estimator has desirable optimal properties, such as efficiency. Second, different from the traditional maximum likelihood estimation (MLE), the proposed method avoid the specification of the distribution, hence is flexible to any kind of distribution, such as heavy tails, multimodal or heterogeneous distribution. Third, the proposed loss function relies on probabilities rather than direct observations as in least squares, contributing the robustness in the proposed estimator. Finally, the proposed loss function involves nonparametric regression function only. This enables a direct application of existing packages, simplifying the computation and programming. We establish the large sample property of the proposed estimator in terms of its excess risk and minimax near-optimal rate. The theoretical results demonstrate that the proposed estimator is equivalent to the true MLE in which the density function is known. Our simulation studies show that the proposed estimator outperforms the existing methods in terms of prediction accuracy, efficiency and robustness. Particularly, it is comparable to the true MLE, and even gets better as the sample size increases. This implies that the adaptive and data-driven loss function from the estimated density may offer an additional avenue for capturing valuable information. We further apply the proposed method to four real data examples, resulting in significantly reduced out-of-sample prediction errors compared to existing methods

    Evaluation of transplant benefits with the U.S. Scientific Registry of Transplant Recipients by semiparametric regression of mean residual life

    Full text link
    Kidney transplantation is the most effective renal replacement therapy for end stage renal disease patients. With the severe shortage of kidney supplies and for the clinical effectiveness of transplantation, patient's life expectancy post transplantation is used to prioritize patients for transplantation; however, severe comorbidity conditions and old age are the most dominant factors that negatively impact post-transplantation life expectancy, effectively precluding sick or old patients from receiving transplants. It would be crucial to design objective measures to quantify the transplantation benefit by comparing the mean residual life with and without a transplant, after adjusting for comorbidity and demographic conditions. To address this urgent need, we propose a new class of semiparametric covariate-dependent mean residual life models. Our method estimates covariate effects semiparametrically efficiently and the mean residual life function nonparametrically, enabling us to predict the residual life increment potential for any given patient. Our method potentially leads to a more fair system that prioritizes patients who would have the largest residual life gains. Our analysis of the kidney transplant data from the U.S. Scientific Registry of Transplant Recipients also suggests that a single index of covariates summarize well the impacts of multiple covariates, which may facilitate interpretations of each covariate's effect. Our subgroup analysis further disclosed inequalities in survival gains across groups defined by race, gender and insurance type (reflecting socioeconomic status).Comment: 68 pages, 13 figures. arXiv admin note: text overlap with arXiv:2011.0406

    Inferences in Censored Cost Regression Models with Empirical Likelihood

    Get PDF
    In many studies of health economics, we are interested in the expected total cost over a certain period for a patient with given characteristics. Problems can arise if cost estimation models do not account for distributional aspects of costs. Two such problems are 1) the skewed nature of the data and 2) censored observations. In this paper we propose an empirical likelihood (EL) method for constructing a confidence region for the vector of regression parameters and a confidence interval for the expected total cost of a patient with the given covariates. We show that this new method has good theoretical properties and compare its finite-sample properties with the existing method. Our simulation results demonstrate that the new EL-based method performs equally well with the existing method when cost data are not so skewed, and outperforms the existing method when cost data are highly skewed. Finally, we illustrate the application of our method in a real data set

    Two-directional simultaneous inference for high-dimensional models

    Full text link
    This paper proposes a general two directional simultaneous inference (TOSI) framework for high-dimensional models with a manifest variable or latent variable structure, for example, high-dimensional mean models, high-dimensional sparse regression models, and high-dimensional latent factors models. TOSI performs simultaneous inference on a set of parameters from two directions, one to test whether the assumed zero parameters indeed are zeros and one to test whether exist zeros in the parameter set of nonzeros. As a result, we can exactly identify whether the parameters are zeros, thereby keeping the data structure fully and parsimoniously expressed. We theoretically prove that the proposed TOSI method asymptotically controls the Type I error at the prespecified significance level and that the testing power converges to one. Simulations are conducted to examine the performance of the proposed method in finite sample situations and two real datasets are analyzed. The results show that the TOSI method is more predictive and has more interpretable estimators than existing methods

    Selection of Latent Variables for Multiple Mixed-Outcome Models

    Get PDF
    Latent variable models have been widely used for modeling the dependence structure of multiple outcomes data. As the formulation of a latent variable model is often unknown a priori, misspecification could distort the dependence structure and lead to unreliable model inference. More- over, the multiple outcomes are often of varying types (e.g., continuous and ordinal), which presents analytical challenges. In this article, we present a class of general latent variable models that can accommodate mixed types of outcomes, and further propose a novel selection approach that simultaneously selects latent variables and estimates model parameters. We show that the proposed estimators are consistent, asymptotically normal, and have the Oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of a dataset collected in the World Values Survey (WVS), a global research project that explores peoples\u27 values and beliefs and what social and personal characteristics might influence them
    corecore