63,067 research outputs found

    Finite sample performance of linear least squares estimators under sub-Gaussian martingale difference noise

    Full text link
    Linear Least Squares is a very well known technique for parameter estimation, which is used even when sub-optimal, because of its very low computational requirements and the fact that exact knowledge of the noise statistics is not required. Surprisingly, bounding the probability of large errors with finitely many samples has been left open, especially when dealing with correlated noise with unknown covariance. In this paper we analyze the finite sample performance of the linear least squares estimator under sub-Gaussian martingale difference noise. In order to analyze this important question we used concentration of measure bounds. When applying these bounds we obtained tight bounds on the tail of the estimator's distribution. We show the fast exponential convergence of the number of samples required to ensure a given accuracy with high probability. We provide probability tail bounds on the estimation error's norm. Our analysis method is simple and uses simple L∞L_{\infty} type bounds on the estimation error. The tightness of the bounds is tested through simulation. The proposed bounds make it possible to predict the number of samples required for least squares estimation even when least squares is sub-optimal and used for computational simplicity. The finite sample analysis of least squares models with this general noise model is novel

    A Provable Smoothing Approach for High Dimensional Generalized Regression with Applications in Genomics

    Get PDF
    In many applications, linear models fit the data poorly. This article studies an appealing alternative, the generalized regression model. This model only assumes that there exists an unknown monotonically increasing link function connecting the response YY to a single index XTβ∗X^T\beta^* of explanatory variables X∈RdX\in\mathbb{R}^d. The generalized regression model is flexible and covers many widely used statistical models. It fits the data generating mechanisms well in many real problems, which makes it useful in a variety of applications where regression models are regularly employed. In low dimensions, rank-based M-estimators are recommended to deal with the generalized regression model, giving root-nn consistent estimators of β∗\beta^*. Applications of these estimators to high dimensional data, however, are questionable. This article studies, both theoretically and practically, a simple yet powerful smoothing approach to handle the high dimensional generalized regression model. Theoretically, a family of smoothing functions is provided, and the amount of smoothing necessary for efficient inference is carefully calculated. Practically, our study is motivated by an important and challenging scientific problem: decoding gene regulation by predicting transcription factors that bind to cis-regulatory elements. Applying our proposed method to this problem shows substantial improvement over the state-of-the-art alternative in real data.Comment: 53 page

    Multichannel sparse recovery of complex-valued signals using Huber's criterion

    Full text link
    In this paper, we generalize Huber's criterion to multichannel sparse recovery problem of complex-valued measurements where the objective is to find good recovery of jointly sparse unknown signal vectors from the given multiple measurement vectors which are different linear combinations of the same known elementary vectors. This requires careful characterization of robust complex-valued loss functions as well as Huber's criterion function for the multivariate sparse regression problem. We devise a greedy algorithm based on simultaneous normalized iterative hard thresholding (SNIHT) algorithm. Unlike the conventional SNIHT method, our algorithm, referred to as HUB-SNIHT, is robust under heavy-tailed non-Gaussian noise conditions, yet has a negligible performance loss compared to SNIHT under Gaussian noise. Usefulness of the method is illustrated in source localization application with sensor arrays.Comment: To appear in CoSeRa'15 (Pisa, Italy, June 16-19, 2015). arXiv admin note: text overlap with arXiv:1502.0244

    Robust scaling in fusion science: case study for the L-H power threshold

    Get PDF
    In regression analysis for deriving scaling laws in the context of fusion studies, standard regression methods are usually applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to fusion data. More sophisticated statistical techniques are available, but they are not widely used in the fusion community and, moreover, the predictions by scaling laws may vary significantly depending on the particular regression technique. Therefore we have developed a new regression method, which we call geodesic least squares regression (GLS), that is robust in the presence of significant uncertainty on both the data and the regression model. The method is based on probabilistic modeling of all variables involved in the scaling expression, using adequate probability distributions and a natural similarity measure between them (geodesic distance). In this work we revisit the scaling law for the power threshold for the L-to-H transition in tokamaks, using data from the multi-machine ITPA databases. Depending on model assumptions, OLS can yield different predictions of the power threshold for ITER. In contrast, GLS regression delivers consistent results. Consequently, given the ubiquity and importance of scaling laws and parametric dependence studies in fusion research, GLS regression is proposed as a robust and easily implemented alternative to classic regression techniques
    • …
    corecore