2,113,997 research outputs found
Truthful Linear Regression
We consider the problem of fitting a linear model to data held by individuals
who are concerned about their privacy. Incentivizing most players to truthfully
report their data to the analyst constrains our design to mechanisms that
provide a privacy guarantee to the participants; we use differential privacy to
model individuals' privacy losses. This immediately poses a problem, as
differentially private computation of a linear model necessarily produces a
biased estimation, and existing approaches to design mechanisms to elicit data
from privacy-sensitive individuals do not generalize well to biased estimators.
We overcome this challenge through an appropriate design of the computation and
payment scheme.Comment: To appear in Proceedings of the 28th Annual Conference on Learning
Theory (COLT 2015
Bayesian Linear Regression
The paper is concerned with Bayesian analysis under prior-data conflict, i.e. the situation when observed data are rather unexpected under the prior (and the sample size is not large enough to eliminate the influence of the prior). Two approaches for Bayesian linear regression modeling based on conjugate priors are considered in detail, namely the standard approach also described in Fahrmeir, Kneib & Lang (2007) and an alternative adoption of the general construction procedure for exponential family sampling models. We recognize that - in contrast to some standard i.i.d. models like the scaled normal model and the Beta-Binomial / Dirichlet-Multinomial model, where prior-data conflict is completely ignored - the models may show some reaction to prior-data conflict, however in a rather unspecific way. Finally we briefly sketch the extension to a corresponding imprecise probability model, where, by considering sets of prior distributions instead of a single prior, prior-data conflict can be handled in a very appealing and intuitive way
Current status linear regression
We construct -consistent and asymptotically normal estimates for
the finite dimensional regression parameter in the current status linear
regression model, which do not require any smoothing device and are based on
maximum likelihood estimates (MLEs) of the infinite dimensional parameter. We
also construct estimates, again only based on these MLEs, which are arbitrarily
close to efficient estimates, if the generalized Fisher information is finite.
This type of efficiency is also derived under minimal conditions for estimates
based on smooth non-monotone plug-in estimates of the distribution function.
Algorithms for computing the estimates and for selecting the bandwidth of the
smooth estimates with a bootstrap method are provided. The connection with
results in the econometric literature is also pointed out.Comment: 64 pages, 6 figure
Linear Regression Diagnostics
This paper attempts to provide the user of linear multiple regression with a battery of diagnostic tools to determine which, if any, data points have high leverage or influence on the estimation process and how these possibly discrepant data points differ from the patterns set by the majority of the data. The point of view taken is that when diagnostics indicate the presence of anomolous data, the choice is open as to whether these data are in fact unusual and helpful, or possibly harmful and thus in need of modifications or deletion. The methodology developed depends on differences, derivatives, and decompositions of basic regression statistics. There is also a discussion of how these techniques can be used with robust and ridge estimators. An example is given showing the use of diagnostic methods in the estimation of a cross-country savings rate model.
Scaled Sparse Linear Regression
Scaled sparse linear regression jointly estimates the regression coefficients
and noise level in a linear model. It chooses an equilibrium with a sparse
regression method by iteratively estimating the noise level via the mean
residual square and scaling the penalty in proportion to the estimated noise
level. The iterative algorithm costs little beyond the computation of a path or
grid of the sparse regression estimator for penalty levels above a proper
threshold. For the scaled lasso, the algorithm is a gradient descent in a
convex minimization of a penalized joint loss function for the regression
coefficients and noise level. Under mild regularity conditions, we prove that
the scaled lasso simultaneously yields an estimator for the noise level and an
estimated coefficient vector satisfying certain oracle inequalities for
prediction, the estimation of the noise level and the regression coefficients.
These inequalities provide sufficient conditions for the consistency and
asymptotic normality of the noise level estimator, including certain cases
where the number of variables is of greater order than the sample size.
Parallel results are provided for the least squares estimation after model
selection by the scaled lasso. Numerical results demonstrate the superior
performance of the proposed methods over an earlier proposal of joint convex
minimization.Comment: 20 page
Local linear spatial regression
A local linear kernel estimator of the regression function x\mapsto
g(x):=E[Y_i|X_i=x], x\in R^d, of a stationary (d+1)-dimensional spatial process
{(Y_i,X_i),i\in Z^N} observed over a rectangular domain of the form
I_n:={i=(i_1,...,i_N)\in Z^N| 1\leq i_k\leq n_k,k=1,...,N}, n=(n_1,...,n_N)\in
Z^N, is proposed and investigated. Under mild regularity assumptions,
asymptotic normality of the estimators of g(x) and its derivatives is
established. Appropriate choices of the bandwidths are proposed. The spatial
process is assumed to satisfy some very general mixing conditions, generalizing
classical time-series strong mixing concepts. The size of the rectangular
domain I_n is allowed to tend to infinity at different rates depending on the
direction in Z^N.Comment: Published at http://dx.doi.org/10.1214/009053604000000850 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Partially linear censored quantile regression
Censored regression quantile (CRQ) methods provide a powerful and flexible approach to the analysis of censored survival data when standard linear models are felt to be appropriate. In many cases however, greater flexibility is desired to go beyond the usual multiple regression paradigm. One area of common interest is that of partially linear models: one (or more) of the explanatory covariates are assumed to act on the response through a non-linear function. Here the CRQ approach of Portnoy (J Am Stat Assoc 98:1001–1012, 2003) is extended to this partially linear setting. Basic consistency results are presented. A simulation experiment and unemployment example justify the value of the partially linear approach over methods based on the Cox proportional hazards model and on methods not permitting nonlinearity
On-line predictive linear regression
We consider the on-line predictive version of the standard problem of linear
regression; the goal is to predict each consecutive response given the
corresponding explanatory variables and all the previous observations. We are
mainly interested in prediction intervals rather than point predictions. The
standard treatment of prediction intervals in linear regression analysis has
two drawbacks: (1) the classical prediction intervals guarantee that the
probability of error is equal to the nominal significance level epsilon, but
this property per se does not imply that the long-run frequency of error is
close to epsilon; (2) it is not suitable for prediction of complex systems as
it assumes that the number of observations exceeds the number of parameters. We
state a general result showing that in the on-line protocol the frequency of
error for the classical prediction intervals does equal the nominal
significance level, up to statistical fluctuations. We also describe
alternative regression models in which informative prediction intervals can be
found before the number of observations exceeds the number of parameters. One
of these models, which only assumes that the observations are independent and
identically distributed, is popular in machine learning but greatly underused
in the statistical theory of regression.Comment: 34 pages; 6 figures; 1 table. arXiv admin note: substantial text
overlap with arXiv:0906.312
- …
