172,662 research outputs found

    Functional linear regression analysis for longitudinal data

    Full text link
    We propose nonparametric methods for functional linear regression which are designed for sparse longitudinal data, where both the predictor and response are functions of a covariate such as time. Predictor and response processes have smooth random trajectories, and the data consist of a small number of noisy repeated measurements made at irregular times for a sample of subjects. In longitudinal studies, the number of repeated measurements per subject is often small and may be modeled as a discrete random number and, accordingly, only a finite and asymptotically nonincreasing number of measurements are available for each subject or experimental unit. We propose a functional regression approach for this situation, using functional principal component analysis, where we estimate the functional principal component scores through conditional expectations. This allows the prediction of an unobserved response trajectory from sparse measurements of a predictor trajectory. The resulting technique is flexible and allows for different patterns regarding the timing of the measurements obtained for predictor and response trajectories. Asymptotic properties for a sample of nn subjects are investigated under mild conditions, as nā†’āˆžn\to \infty, and we obtain consistent estimation for the regression function. Besides convergence results for the components of functional linear regression, such as the regression parameter function, we construct asymptotic pointwise confidence bands for the predicted trajectories. A functional coefficient of determination as a measure of the variance explained by the functional regression model is introduced, extending the standard R2R^2 to the functional case. The proposed methods are illustrated with a simulation study, longitudinal primary biliary liver cirrhosis data and an analysis of the longitudinal relationship between blood pressure and body mass index.Comment: Published at http://dx.doi.org/10.1214/009053605000000660 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Robust Linear Regression Analysis - A Greedy Approach

    Full text link
    The task of robust linear estimation in the presence of outliers is of particular importance in signal processing, statistics and machine learning. Although the problem has been stated a few decades ago and solved using classical (considered nowadays) methods, recently it has attracted more attention in the context of sparse modeling, where several notable contributions have been made. In the present manuscript, a new approach is considered in the framework of greedy algorithms. The noise is split into two components: a) the inlier bounded noise and b) the outliers, which are explicitly modeled by employing sparsity arguments. Based on this scheme, a novel efficient algorithm (Greedy Algorithm for Robust Denoising - GARD), is derived. GARD alternates between a least square optimization criterion and an Orthogonal Matching Pursuit (OMP) selection step that identifies the outliers. The case where only outliers are present has been studied separately, where bounds on the \textit{Restricted Isometry Property} guarantee that the recovery of the signal via GARD is exact. Moreover, theoretical results concerning convergence as well as the derivation of error bounds in the case of additional bounded noise are discussed. Finally, we provide extensive simulations, which demonstrate the comparative advantages of the new technique

    The Loss Rank Criterion for Variable Selection in Linear Regression Analysis

    Full text link
    Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model selection criterion is proposed to select the best one among this preselected set. The approach leads to a fast and efficient procedure for variable selection, especially in high-dimensional settings. Model selection consistency of the suggested criterion is proven when the number of covariates d is fixed. Simulation studies suggest that the criterion still enjoys model selection consistency when d is much larger than the sample size. The simulations also show that our approach for variable selection works surprisingly well in comparison with existing competitors. The method is also applied to a real data set.Comment: 18 pages, 1 figur

    A Linear Regression Analysis for Understanding High School Grade Point Average (GPA)

    Get PDF
    This project analyzes different aspects that may contribute to the grade point average (GPA) of high school students. GPA is important because it is one of the fundamental measures of student success. I gathered data on 206 individuals from the Bureau of Labor Statisticsā€™ National Longitudinal Survey of Youth 1997-2010. I selected variables for the regression analysis from categories including general motivation for success and optimism; use of time; other academic measures; and health habits and lifestyle. Using SPSS to run the regressions, I found that variables that I believed most related to GPA ā€“ such as amount of sleep, time spent studying, and number of absences ā€“ were not found to be significant. A student\u27s SAT score was the only variable directly related to school that was significant in the regression. On the other hand, the variables for whether a student spends regular time in prayer and whether they consider themselves organized show a significant impact on GPA. Individual identity variables of race and gender were significant as well

    Mathematical programming for piecewise linear regression analysis

    Get PDF
    In data mining, regression analysis is a computational tool that predicts continuous output variables from a number of independent input variables, by approximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression, etc. In terms of piece-wise regression, the existing methods in literature are usually restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise linear regression method is introduced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultaneously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demonstrate the efficiency of our proposed method. It is shown that our proposed piece-wise regression method can be solved to global optimality for datasets of thousands samples, which also consistently achieves higher prediction accuracy than a number of state-of-the-art regression methods. Another advantage of the proposed method is that the learned model can be conveniently expressed as a small number of if-then rules that are easily interpretable. Overall, this work proposes an efficient rule-based multivariate regression method based on piece-wise functions and achieves better prediction performance than state-of-the-arts approaches. This novel method can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base

    A new method of robust linear regression analysis: some monte carlo experiments

    Get PDF
    This paper elaborates on the deleterious effects of outliers and corruption of dataset on estimation of linear regression coefficients by the Ordinary Least Squares method. Motivated to ameliorate the estimation procedure, we have introduced the robust regression estimators based on Campbellā€™s robust covariance estimation method. We have investigated into two possibilities: first, when the weights are obtained strictly as suggested by Campbell and secondly, when weights are assigned in view of the Hampelā€™s median absolute deviation measure of dispersion. Both types of weights are obtained iteratively. Using these two types of weights, two different types of weighted least squares procedures have been proposed. These procedures are applied to detect outliers in and estimate regression coefficients from some widely used datasets such as stackloss, water salinity, Hawkins-Bradu-Kass, Hertzsprung-Russell Star and pilot-point datasets. It has been observed that Campbell-II in particular detects the outlier data points quite well (although occasionally signaling false positive too as very mild outliers). Subsequently, some Monte Carlo experiments have been carried out to assess the properties of these estimators. Findings of these experiments indicate that for larger number and size of outliers, the Campbell-II procedure outperforms the Campbell-I procedure. Unless perturbations introduced to the dataset are sizably numerous and very large in magnitude, the estimated coefficients by the Campbell-II method are also nearly unbiased. A Fortan Program for the proposed method has also been appended.Robust regression; Campbell's robust covariance; outliers; Stackloss;Water Salinity; Hawkins-Bradu-Kass; Hertzsprung-Russell Star; Pilot-Plant; Dataset;Monte Carlo; Experiment; Fortran Computer Program

    A NEW METHOD OF ROBUST LINEAR REGRESSION ANALYSIS: SOME MONTE CARLO EXPERIMENTS

    Get PDF
    This paper has elaborated upon the deleterious effects of outliers and corruption of dataset on estimation of linear regression coefficients by the Ordinary Least Squares method. Motivated to ameliorate the estimation procedure, it introduces the robust regression estimators based on Campbell's robust covariance estimation method. It investigates into two possibilities: first, when the weights are obtained strictly as suggested by Campbell and secondly, when weights are assigned in view of the Hampel's median absolute deviation measure of dispersion. Both types of weights are obtained iteratively and using those weights, two different types of weighted least squares procedures have been proposed. These procedures are applied to detect outliers in and estimate regression coefficients from some widely used datasets such as stackloss, water salinity, Hawkins- Bradu-Kass, Hertzsprung-Russell Star and pilot-point datasets. It has been observed that Campbell-II in particular detects the outlier data points quite well. Subsequently, some Monte Carlo experiments have been carried out to assess the properties of these estimators whose findings indicate that for larger number and size of outliers, the Campbell-II procedure outperforms the Campbell-I procedure. Unless perturbations introduced to the dataset are numerous and very large in magnitude, the estimated coefficients are also nearly unbiased.Robust regression, Campbell's robust covariance, outliers, Monte Carlo Experiment, Median absolute Deviation

    Linear regression analysis using the relative squared error

    Get PDF
    AbstractIn order to determine estimators and predictors in a generalized linear regression model we apply a suitably defined relative squared error instead of the most frequently used absolute squared error. The general solution of a matrix problem is derived leading to minimax estimators and predictors. Furthermore, we consider an important special case, where an analogon to a well-known relation between estimators and predictors holds and where generalized least squares estimators as well as Kuksā€“Olman and ridge estimators play a prominent role

    Linear Regression Analysis Using a Programable Pocket Calculator

    Get PDF
    Linear regression is an extremely useful ā€œleast squaresā€ technique for fitting a linear equation to a set of data. This program calculates means, sums of squares, and sums of cross-products of the dependent and independent values which are entered only once. It also calculates the slope and intercept of the line as well as the coefficients of determination and correlation. The program calculates a predicted dependent variable from a given independent variable and vice versa. It also computes the reduction in sum of squares due to regression, residual sum of squares and degrees of freedom, variance about regression, standard error of the estimate, the standard deviation about the slope, and the t-test on the slope of the line
    • ā€¦
    corecore