13,164 research outputs found

    Binscatter Regressions

    Full text link
    We introduce the \texttt{Stata} (and \texttt{R}) package \textsf{Binsreg}, which implements the binscatter methods developed in \citet*{Cattaneo-Crump-Farrell-Feng_2019_Binscatter}. The package includes the commands \texttt{binsreg}, \texttt{binsregtest}, and \texttt{binsregselect}. The first command (\texttt{binsreg}) implements binscatter for the regression function and its derivatives, offering several point estimation, confidence intervals and confidence bands procedures, with particular focus on constructing binned scatter plots. The second command (\texttt{binsregtest}) implements hypothesis testing procedures for parametric specification and for nonparametric shape restrictions of the unknown regression function. Finally, the third command (\texttt{binsregselect}) implements data-driven number of bins selectors for binscatter implementation using either quantile-spaced or evenly-spaced binning/partitioning. All the commands allow for covariate adjustment, smoothness restrictions, weighting and clustering, among other features. A companion \texttt{R} package with the same capabilities is also available

    Regression of ranked responses when raw responses are censored

    Full text link
    We discuss semiparametric regression when only the ranks of responses are observed. The model is Yi=F(xi′β0+εi)Y_i = F (\mathbf{x}_i'{\boldsymbol\beta}_0 + \varepsilon_i), where YiY_i is the unobserved response, FF is a monotone increasing function, xi\mathbf{x}_i is a known p−p-vector of covariates, β0{\boldsymbol\beta}_0 is an unknown pp-vector of interest, and εi\varepsilon_i is an error term independent of xi\mathbf{x}_i. We observe {(xi,Rn(Yi)):i=1,…,n}\{(\mathbf{x}_i,R_n(Y_i)) : i = 1,\ldots ,n\}, where RnR_n is the ordinal rank function. We explore a novel estimator under Gaussian assumptions. We discuss the literature, apply the method to an Alzheimer's disease biomarker, conduct simulation studies, and prove consistency and asymptotic normality.Comment: 33 pages, 6 figure

    Lecture notes on ridge regression

    Full text link
    The linear regression model cannot be fitted to high-dimensional data, as the high-dimensionality brings about empirical non-identifiability. Penalized regression overcomes this non-identifiability by augmentation of the loss function by a penalty (i.e. a function of regression coefficients). The ridge penalty is the sum of squared regression coefficients, giving rise to ridge regression. Here many aspect of ridge regression are reviewed e.g. moments, mean squared error, its equivalence to constrained estimation, and its relation to Bayesian regression. Finally, its behaviour and use are illustrated in simulation and on omics data. Subsequently, ridge regression is generalized to allow for a more general penalty. The ridge penalization framework is then translated to logistic regression and its properties are shown to carry over. To contrast ridge penalized estimation, the final chapter introduces its lasso counterpart

    Fitting Linear Mixed-Effects Models using lme4

    Get PDF
    Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.Comment: 51 pages, including R code, and an appendi

    Analysis of fatigue, fatique-crack propagation, and fracture data

    Get PDF
    Analytical methods have been developed for consolidation of fatigue, fatigue-crack propagation, and fracture data for use in design of metallic aerospace structural components. To evaluate these methods, a comprehensive file of data on 2024 and 7075 aluminums, Ti-6A1-4V, and 300M and D6Ac steels was established. Data were obtained from both published literature and unpublished reports furnished by aerospace companies. Fatigue and fatigue-crack-propagation analyses were restricted to information obtained from constant-amplitude load or strain cycling of specimens in air at room temperature. Fracture toughness data were from tests of center-cracked tension panels, part-through crack specimens, and compact-tension specimens

    Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

    Full text link
    In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method
    • …
    corecore