7,028 research outputs found

    Foul or Fair?

    Get PDF
    This paper gives a short overview of Monte Carlo studies on the usefulness of Heckman?s (1976, 1979) two?step estimator for estimating a selection model. It shows that exploratory work to check for collinearity problems is strongly recommended before deciding on which estimator to apply. In the absence of collinearity problems, the full?information maximum likelihood estimator is preferable to the limited?information two?step method of Heckman, although the latter also gives reasonable results. If, however, collinearity problems prevail, subsample OLS (or the Two?Part Model) is the most robust amongst the simple?to? calculate estimators. --

    Inverse probability weighted estimation for general missing data problems

    Get PDF
    I study inverse probability weighted M-estimation under a general missing data scheme. The cases covered that do not previously appear in the literature include M-estimation with missing data due to a censored survival time, propensity score estimation of the average treatment effect for linear exponential family quasi-log-likelihood functions, and variable probability sampling with observed retainment frequencies. I extend an important result known to hold in special cases: estimating the selection probabilities is generally more efficient than if the known selection probabilities could be used in estimation. For the treatment effect case, the setup allows for a simple characterization of a “double robustness” result due to Scharfstein, Rotnitzky, and Robins (1999): given appropriate choices for the conditional mean function and quasi-log-likelihood function, only one of the conditional mean or selection probability needs to be correctly specified in order to consistently estimate the average treatment effect.

    Density and Hazard Rate Estimation for Censored and ?-mixing Data Using Gamma Kernels

    Get PDF
    In this paper we consider the nonparametric estimation for a density and hazard rate function for right censored ?-mixing survival time data using kernel smoothing techniques. Since survival times are positive with potentially a high concentration at zero, one has to take into account the bias problems when the functions are estimated in the boundary region. In this paper, gamma kernel estimators of the density and the hazard rate function are proposed. The estimators use adaptive weights depending on the point in which we estimate the function, and they are robust to the boundary bias problem. For both estimators, the mean squared error properties, including the rate of convergence, the almost sure consistency and the asymptotic normality are investigated. The results of a simulation demonstrate the excellent performance of the proposed estimators.Gamma kernel, Kaplan Meier, density and hazard function, mean integrated squared error, consistency, asymptotic normality.

    The Likelihood of Mixed Hitting Times

    Full text link
    We present a method for computing the likelihood of a mixed hitting-time model that specifies durations as the first time a latent L\'evy process crosses a heterogeneous threshold. This likelihood is not generally known in closed form, but its Laplace transform is. Our approach to its computation relies on numerical methods for inverting Laplace transforms that exploit special properties of the first passage times of L\'evy processes. We use our method to implement a maximum likelihood estimator of the mixed hitting-time model in MATLAB. We illustrate the application of this estimator with an analysis of Kennan's (1985) strike data.Comment: 35 page

    Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies

    Full text link
    Lung cancer is among the most common cancers in the United States, in terms of incidence and mortality. In 2009, it is estimated that more than 150,000 deaths will result from lung cancer alone. Genetic information is an extremely valuable data source in characterizing the personal nature of cancer. Over the past several years, investigators have conducted numerous association studies where intensive genetic data is collected on relatively few patients compared to the numbers of gene predictors, with one scientific goal being to identify genetic features associated with cancer recurrence or survival. In this note, we propose high-dimensional survival analysis through a new application of boosting, a powerful tool in machine learning. Our approach is based on an accelerated lifetime model and minimizing the sum of pairwise differences in residuals. We apply our method to a recent microarray study of lung adenocarcinoma and find that our ensemble is composed of 19 genes, while a proportional hazards (PH) ensemble is composed of nine genes, a proper subset of the 19-gene panel. In one of our simulation scenarios, we demonstrate that PH boosting in a misspecified model tends to underfit and ignore moderately-sized covariate effects, on average. Diagnostic analyses suggest that the PH assumption is not satisfied in the microarray data and may explain, in part, the discrepancy in the sets of active coefficients. Our simulation studies and comparative data analyses demonstrate how statistical learning by PH models alone is insufficient.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS426 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore