63,067 research outputs found
Finite sample performance of linear least squares estimators under sub-Gaussian martingale difference noise
Linear Least Squares is a very well known technique for parameter estimation,
which is used even when sub-optimal, because of its very low computational
requirements and the fact that exact knowledge of the noise statistics is not
required. Surprisingly, bounding the probability of large errors with finitely
many samples has been left open, especially when dealing with correlated noise
with unknown covariance. In this paper we analyze the finite sample performance
of the linear least squares estimator under sub-Gaussian martingale difference
noise. In order to analyze this important question we used concentration of
measure bounds. When applying these bounds we obtained tight bounds on the tail
of the estimator's distribution. We show the fast exponential convergence of
the number of samples required to ensure a given accuracy with high
probability. We provide probability tail bounds on the estimation error's norm.
Our analysis method is simple and uses simple type bounds on the
estimation error. The tightness of the bounds is tested through simulation. The
proposed bounds make it possible to predict the number of samples required for
least squares estimation even when least squares is sub-optimal and used for
computational simplicity. The finite sample analysis of least squares models
with this general noise model is novel
A Provable Smoothing Approach for High Dimensional Generalized Regression with Applications in Genomics
In many applications, linear models fit the data poorly. This article studies
an appealing alternative, the generalized regression model. This model only
assumes that there exists an unknown monotonically increasing link function
connecting the response to a single index of explanatory
variables . The generalized regression model is flexible and
covers many widely used statistical models. It fits the data generating
mechanisms well in many real problems, which makes it useful in a variety of
applications where regression models are regularly employed. In low dimensions,
rank-based M-estimators are recommended to deal with the generalized regression
model, giving root- consistent estimators of . Applications of
these estimators to high dimensional data, however, are questionable. This
article studies, both theoretically and practically, a simple yet powerful
smoothing approach to handle the high dimensional generalized regression model.
Theoretically, a family of smoothing functions is provided, and the amount of
smoothing necessary for efficient inference is carefully calculated.
Practically, our study is motivated by an important and challenging scientific
problem: decoding gene regulation by predicting transcription factors that bind
to cis-regulatory elements. Applying our proposed method to this problem shows
substantial improvement over the state-of-the-art alternative in real data.Comment: 53 page
Multichannel sparse recovery of complex-valued signals using Huber's criterion
In this paper, we generalize Huber's criterion to multichannel sparse
recovery problem of complex-valued measurements where the objective is to find
good recovery of jointly sparse unknown signal vectors from the given multiple
measurement vectors which are different linear combinations of the same known
elementary vectors. This requires careful characterization of robust
complex-valued loss functions as well as Huber's criterion function for the
multivariate sparse regression problem. We devise a greedy algorithm based on
simultaneous normalized iterative hard thresholding (SNIHT) algorithm. Unlike
the conventional SNIHT method, our algorithm, referred to as HUB-SNIHT, is
robust under heavy-tailed non-Gaussian noise conditions, yet has a negligible
performance loss compared to SNIHT under Gaussian noise. Usefulness of the
method is illustrated in source localization application with sensor arrays.Comment: To appear in CoSeRa'15 (Pisa, Italy, June 16-19, 2015). arXiv admin
note: text overlap with arXiv:1502.0244
Robust scaling in fusion science: case study for the L-H power threshold
In regression analysis for deriving scaling laws in the context of fusion studies, standard regression methods are usually applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to fusion data. More sophisticated statistical techniques are available, but they are not widely used in the fusion community and, moreover, the predictions by scaling laws may vary significantly depending on the particular regression technique. Therefore we have developed a new regression method, which we call geodesic least squares regression (GLS), that is robust in the presence of significant uncertainty on both the data and the regression model. The method is based on probabilistic modeling of all variables involved in the scaling expression, using adequate probability distributions and a natural similarity measure between them (geodesic distance). In this work we revisit the scaling law for the power threshold for the L-to-H transition in tokamaks, using data from the multi-machine ITPA databases. Depending on model assumptions, OLS can yield different predictions of the power threshold for ITER. In contrast, GLS regression delivers consistent results. Consequently, given the ubiquity and importance of scaling laws and parametric dependence studies in fusion research, GLS regression is proposed as a robust and easily implemented alternative to classic regression techniques
- …