172,662 research outputs found
Functional linear regression analysis for longitudinal data
We propose nonparametric methods for functional linear regression which are
designed for sparse longitudinal data, where both the predictor and response
are functions of a covariate such as time. Predictor and response processes
have smooth random trajectories, and the data consist of a small number of
noisy repeated measurements made at irregular times for a sample of subjects.
In longitudinal studies, the number of repeated measurements per subject is
often small and may be modeled as a discrete random number and, accordingly,
only a finite and asymptotically nonincreasing number of measurements are
available for each subject or experimental unit. We propose a functional
regression approach for this situation, using functional principal component
analysis, where we estimate the functional principal component scores through
conditional expectations. This allows the prediction of an unobserved response
trajectory from sparse measurements of a predictor trajectory. The resulting
technique is flexible and allows for different patterns regarding the timing of
the measurements obtained for predictor and response trajectories. Asymptotic
properties for a sample of subjects are investigated under mild conditions,
as , and we obtain consistent estimation for the regression
function. Besides convergence results for the components of functional linear
regression, such as the regression parameter function, we construct asymptotic
pointwise confidence bands for the predicted trajectories. A functional
coefficient of determination as a measure of the variance explained by the
functional regression model is introduced, extending the standard to the
functional case. The proposed methods are illustrated with a simulation study,
longitudinal primary biliary liver cirrhosis data and an analysis of the
longitudinal relationship between blood pressure and body mass index.Comment: Published at http://dx.doi.org/10.1214/009053605000000660 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Robust Linear Regression Analysis - A Greedy Approach
The task of robust linear estimation in the presence of outliers is of
particular importance in signal processing, statistics and machine learning.
Although the problem has been stated a few decades ago and solved using
classical (considered nowadays) methods, recently it has attracted more
attention in the context of sparse modeling, where several notable
contributions have been made. In the present manuscript, a new approach is
considered in the framework of greedy algorithms. The noise is split into two
components: a) the inlier bounded noise and b) the outliers, which are
explicitly modeled by employing sparsity arguments. Based on this scheme, a
novel efficient algorithm (Greedy Algorithm for Robust Denoising - GARD), is
derived. GARD alternates between a least square optimization criterion and an
Orthogonal Matching Pursuit (OMP) selection step that identifies the outliers.
The case where only outliers are present has been studied separately, where
bounds on the \textit{Restricted Isometry Property} guarantee that the recovery
of the signal via GARD is exact. Moreover, theoretical results concerning
convergence as well as the derivation of error bounds in the case of additional
bounded noise are discussed. Finally, we provide extensive simulations, which
demonstrate the comparative advantages of the new technique
The Loss Rank Criterion for Variable Selection in Linear Regression Analysis
Lasso and other regularization procedures are attractive methods for variable
selection, subject to a proper choice of shrinkage parameter. Given a set of
potential subsets produced by a regularization algorithm, a consistent model
selection criterion is proposed to select the best one among this preselected
set. The approach leads to a fast and efficient procedure for variable
selection, especially in high-dimensional settings. Model selection consistency
of the suggested criterion is proven when the number of covariates d is fixed.
Simulation studies suggest that the criterion still enjoys model selection
consistency when d is much larger than the sample size. The simulations also
show that our approach for variable selection works surprisingly well in
comparison with existing competitors. The method is also applied to a real data
set.Comment: 18 pages, 1 figur
A Linear Regression Analysis for Understanding High School Grade Point Average (GPA)
This project analyzes different aspects that may contribute to the grade point average (GPA) of high school students. GPA is important because it is one of the fundamental measures of student success. I gathered data on 206 individuals from the Bureau of Labor Statisticsā National Longitudinal Survey of Youth 1997-2010. I selected variables for the regression analysis from categories including general motivation for success and optimism; use of time; other academic measures; and health habits and lifestyle. Using SPSS to run the regressions, I found that variables that I believed most related to GPA ā such as amount of sleep, time spent studying, and number of absences ā were not found to be significant. A student\u27s SAT score was the only variable directly related to school that was significant in the regression. On the other hand, the variables for whether a student spends regular time in prayer and whether they consider themselves organized show a significant impact on GPA. Individual identity variables of race and gender were significant as well
Mathematical programming for piecewise linear regression analysis
In data mining, regression analysis is a computational tool that predicts continuous output variables from a number of independent input variables, by approximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression, etc. In terms of piece-wise regression, the existing methods in literature are usually restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise linear regression method is introduced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultaneously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demonstrate the efficiency of our proposed method. It is shown that our proposed piece-wise regression method can be solved to global optimality for datasets of thousands samples, which also consistently achieves higher prediction accuracy than a number of state-of-the-art regression methods. Another advantage of the proposed method is that the learned model can be conveniently expressed as a small number of if-then rules that are easily interpretable. Overall, this work proposes an efficient rule-based multivariate regression method based on piece-wise functions and achieves better prediction performance than state-of-the-arts approaches. This novel method can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base
A new method of robust linear regression analysis: some monte carlo experiments
This paper elaborates on the deleterious effects of outliers and corruption of dataset on estimation of linear regression coefficients by the Ordinary Least Squares method. Motivated to ameliorate the estimation procedure, we have introduced the robust regression estimators based on Campbellās robust covariance estimation method. We have investigated into two possibilities: first, when the weights are obtained strictly as suggested by Campbell and secondly, when weights are assigned in view of the Hampelās median absolute deviation measure of dispersion. Both types of weights are obtained iteratively. Using these two types of weights, two different types of weighted least squares procedures have been proposed. These procedures are applied to detect outliers in and estimate regression coefficients from some widely used datasets such as stackloss, water salinity, Hawkins-Bradu-Kass, Hertzsprung-Russell Star and pilot-point datasets. It has been observed that Campbell-II in particular detects the outlier data points quite well (although occasionally signaling false positive too as very mild outliers). Subsequently, some Monte Carlo experiments have been carried out to assess the properties of these estimators. Findings of these experiments indicate that for larger number and size of outliers, the Campbell-II procedure outperforms the Campbell-I procedure. Unless perturbations introduced to the dataset are sizably numerous and very large in magnitude, the estimated coefficients by the Campbell-II method are also nearly unbiased. A Fortan Program for the proposed method has also been appended.Robust regression; Campbell's robust covariance; outliers; Stackloss;Water Salinity; Hawkins-Bradu-Kass; Hertzsprung-Russell Star; Pilot-Plant; Dataset;Monte Carlo; Experiment; Fortran Computer Program
A NEW METHOD OF ROBUST LINEAR REGRESSION ANALYSIS: SOME MONTE CARLO EXPERIMENTS
This paper has elaborated upon the deleterious effects of outliers and corruption of dataset on estimation of linear regression coefficients by the Ordinary Least Squares method. Motivated to ameliorate the estimation procedure, it introduces the robust regression estimators based on Campbell's robust covariance estimation method. It investigates into two possibilities: first, when the weights are obtained strictly as suggested by Campbell and secondly, when weights are assigned in view of the Hampel's median absolute deviation measure of dispersion. Both types of weights are obtained iteratively and using those weights, two different types of weighted least squares procedures have been proposed. These procedures are applied to detect outliers in and estimate regression coefficients from some widely used datasets such as stackloss, water salinity, Hawkins- Bradu-Kass, Hertzsprung-Russell Star and pilot-point datasets. It has been observed that Campbell-II in particular detects the outlier data points quite well. Subsequently, some Monte Carlo experiments have been carried out to assess the properties of these estimators whose findings indicate that for larger number and size of outliers, the Campbell-II procedure outperforms the Campbell-I procedure. Unless perturbations introduced to the dataset are numerous and very large in magnitude, the estimated coefficients are also nearly unbiased.Robust regression, Campbell's robust covariance, outliers, Monte Carlo Experiment, Median absolute Deviation
Linear regression analysis using the relative squared error
AbstractIn order to determine estimators and predictors in a generalized linear regression model we apply a suitably defined relative squared error instead of the most frequently used absolute squared error. The general solution of a matrix problem is derived leading to minimax estimators and predictors. Furthermore, we consider an important special case, where an analogon to a well-known relation between estimators and predictors holds and where generalized least squares estimators as well as KuksāOlman and ridge estimators play a prominent role
Linear Regression Analysis Using a Programable Pocket Calculator
Linear regression is an extremely useful āleast squaresā technique for fitting a linear equation to a set of data. This program calculates means, sums of squares, and sums of cross-products of the dependent and independent values which are entered only once. It also calculates the slope and intercept of the line as well as the coefficients of determination and correlation. The program calculates a predicted dependent variable from a given independent variable and vice versa. It also computes the reduction in sum of squares due to regression, residual sum of squares and degrees of freedom, variance about regression, standard error of the estimate, the standard deviation about the slope, and the t-test on the slope of the line
- ā¦