13,164 research outputs found
Binscatter Regressions
We introduce the \texttt{Stata} (and \texttt{R}) package \textsf{Binsreg},
which implements the binscatter methods developed in
\citet*{Cattaneo-Crump-Farrell-Feng_2019_Binscatter}. The package includes the
commands \texttt{binsreg}, \texttt{binsregtest}, and \texttt{binsregselect}.
The first command (\texttt{binsreg}) implements binscatter for the regression
function and its derivatives, offering several point estimation, confidence
intervals and confidence bands procedures, with particular focus on
constructing binned scatter plots. The second command (\texttt{binsregtest})
implements hypothesis testing procedures for parametric specification and for
nonparametric shape restrictions of the unknown regression function. Finally,
the third command (\texttt{binsregselect}) implements data-driven number of
bins selectors for binscatter implementation using either quantile-spaced or
evenly-spaced binning/partitioning. All the commands allow for covariate
adjustment, smoothness restrictions, weighting and clustering, among other
features. A companion \texttt{R} package with the same capabilities is also
available
Regression of ranked responses when raw responses are censored
We discuss semiparametric regression when only the ranks of responses are
observed. The model is , where is the unobserved response, is a monotone
increasing function, is a known vector of covariates,
is an unknown -vector of interest, and
is an error term independent of . We observe
, where is the ordinal
rank function. We explore a novel estimator under Gaussian assumptions. We
discuss the literature, apply the method to an Alzheimer's disease biomarker,
conduct simulation studies, and prove consistency and asymptotic normality.Comment: 33 pages, 6 figure
Lecture notes on ridge regression
The linear regression model cannot be fitted to high-dimensional data, as the
high-dimensionality brings about empirical non-identifiability. Penalized
regression overcomes this non-identifiability by augmentation of the loss
function by a penalty (i.e. a function of regression coefficients). The ridge
penalty is the sum of squared regression coefficients, giving rise to ridge
regression. Here many aspect of ridge regression are reviewed e.g. moments,
mean squared error, its equivalence to constrained estimation, and its relation
to Bayesian regression. Finally, its behaviour and use are illustrated in
simulation and on omics data. Subsequently, ridge regression is generalized to
allow for a more general penalty. The ridge penalization framework is then
translated to logistic regression and its properties are shown to carry over.
To contrast ridge penalized estimation, the final chapter introduces its lasso
counterpart
Fitting Linear Mixed-Effects Models using lme4
Maximum likelihood or restricted maximum likelihood (REML) estimates of the
parameters in linear mixed-effects models can be determined using the lmer
function in the lme4 package for R. As for most model-fitting functions in R,
the model is described in an lmer call by a formula, in this case including
both fixed- and random-effects terms. The formula and data together determine a
numerical representation of the model from which the profiled deviance or the
profiled REML criterion can be evaluated as a function of some of the model
parameters. The appropriate criterion is optimized, using one of the
constrained optimization functions in R, to provide the parameter estimates. We
describe the structure of the model, the steps in evaluating the profiled
deviance or REML criterion, and the structure of classes or types that
represents such a model. Sufficient detail is included to allow specialization
of these structures by users who wish to write functions to fit specialized
linear mixed models, such as models incorporating pedigrees or smoothing
splines, that are not easily expressible in the formula language used by lmer.Comment: 51 pages, including R code, and an appendi
Analysis of fatigue, fatique-crack propagation, and fracture data
Analytical methods have been developed for consolidation of fatigue, fatigue-crack propagation, and fracture data for use in design of metallic aerospace structural components. To evaluate these methods, a comprehensive file of data on 2024 and 7075 aluminums, Ti-6A1-4V, and 300M and D6Ac steels was established. Data were obtained from both published literature and unpublished reports furnished by aerospace companies. Fatigue and fatigue-crack-propagation analyses were restricted to information obtained from constant-amplitude load or strain cycling of specimens in air at room temperature. Fracture toughness data were from tests of center-cracked tension panels, part-through crack specimens, and compact-tension specimens
Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance
In this paper we present a linear regression model for modal symbolic data.
The observed variables are histogram variables according to the definition
given in the framework of Symbolic Data Analysis and the parameters of the
model are estimated using the classic Least Squares method. An appropriate
metric is introduced in order to measure the error between the observed and the
predicted distributions. In particular, the Wasserstein distance is proposed.
Some properties of such metric are exploited to predict the response variable
as direct linear combination of other independent histogram variables. Measures
of goodness of fit are discussed. An application on real data corroborates the
proposed method
- …