Search CORE

13,164 research outputs found

Binscatter Regressions

Author: Cattaneo Matias D.
Crump Richard K.
Farrell Max H.
Feng Yingjie
Publication venue
Publication date: 25/02/2019
Field of study

We introduce the \texttt{Stata} (and \texttt{R}) package \textsf{Binsreg}, which implements the binscatter methods developed in \citet*{Cattaneo-Crump-Farrell-Feng_2019_Binscatter}. The package includes the commands \texttt{binsreg}, \texttt{binsregtest}, and \texttt{binsregselect}. The first command (\texttt{binsreg}) implements binscatter for the regression function and its derivatives, offering several point estimation, confidence intervals and confidence bands procedures, with particular focus on constructing binned scatter plots. The second command (\texttt{binsregtest}) implements hypothesis testing procedures for parametric specification and for nonparametric shape restrictions of the unknown regression function. Finally, the third command (\texttt{binsregselect}) implements data-driven number of bins selectors for binscatter implementation using either quantile-spaced or evenly-spaced binning/partitioning. All the commands allow for covariate adjustment, smoothness restrictions, weighting and clustering, among other features. A companion \texttt{R} package with the same capabilities is also available

arXiv.org e-Print Archive

Regression of ranked responses when raw responses are censored

Author: Abramson Ian
Donohue Michael C.
Gamst Anthony C.
Rissman Robert A.
Publication venue
Publication date: 24/02/2016
Field of study

We discuss semiparametric regression when only the ranks of responses are observed. The model is

Y_i = F (\mathbf{x}_i'{\boldsymbol\beta}_0 + \varepsilon_i)

, where

Y_i

is the unobserved response,

F

is a monotone increasing function,

\mathbf{x}_i

is a known

p-

vector of covariates,

{\boldsymbol\beta}_0

is an unknown

p

-vector of interest, and

\varepsilon_i

is an error term independent of

\mathbf{x}_i

. We observe

\{(\mathbf{x}_i,R_n(Y_i)) : i = 1,\ldots ,n\}

, where

R_n

is the ordinal rank function. We explore a novel estimator under Gaussian assumptions. We discuss the literature, apply the method to an Alzheimer's disease biomarker, conduct simulation studies, and prove consistency and asymptotic normality.Comment: 33 pages, 6 figure

arXiv.org e-Print Archive

eScholarship - University of California

Lecture notes on ridge regression

Author: van Wieringen Wessel N.
Publication venue
Publication date: 02/08/2020
Field of study

The linear regression model cannot be fitted to high-dimensional data, as the high-dimensionality brings about empirical non-identifiability. Penalized regression overcomes this non-identifiability by augmentation of the loss function by a penalty (i.e. a function of regression coefficients). The ridge penalty is the sum of squared regression coefficients, giving rise to ridge regression. Here many aspect of ridge regression are reviewed e.g. moments, mean squared error, its equivalence to constrained estimation, and its relation to Bayesian regression. Finally, its behaviour and use are illustrated in simulation and on omics data. Subsequently, ridge regression is generalized to allow for a more general penalty. The ridge penalization framework is then translated to logistic regression and its properties are shown to carry over. To contrast ridge penalized estimation, the final chapter introduces its lasso counterpart

arXiv.org e-Print Archive

Fitting Linear Mixed-Effects Models using lme4

Author: Bates Douglas
Bolker Ben
Mächler Martin
Walker Steve
Publication venue
Publication date: 23/06/2014
Field of study

Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.Comment: 51 pages, including R code, and an appendi

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Journal of Statistical Software

Analysis of fatigue, fatique-crack propagation, and fracture data

Author: Davies K. B.
Feddersen C. E.
Jaske C. E.
Rice R. C.
Publication venue
Publication date
Field of study

Analytical methods have been developed for consolidation of fatigue, fatigue-crack propagation, and fracture data for use in design of metallic aerospace structural components. To evaluate these methods, a comprehensive file of data on 2024 and 7075 aluminums, Ti-6A1-4V, and 300M and D6Ac steels was established. Data were obtained from both published literature and unpublished reports furnished by aerospace companies. Fatigue and fatigue-crack-propagation analyses were restricted to information obtained from constant-amplitude load or strain cycling of specimens in air at room temperature. Fracture toughness data were from tests of center-cracked tension panels, part-through crack specimens, and compact-tension specimens

NASA Technical Reports Server

Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

Author: A Irpino
Antonio Irpino
B Efron
CL Lawson
CL Mallows
E Diday
EAL Neto
EAL Neto
G Dall’Aglio
H Bock
J Arroyo
L Billard
L Kantorovich
L Wasserstein
M Noirhomme-Fraiture
P Bertrand
P Bickel
R Tibshirani
Rosanna Verde
WG Gilchrist
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2012
Field of study

In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"