957,925 research outputs found
Average group effect of strongly correlated predictor variables is estimable
It is well known that individual parameters of strongly correlated predictor
variables in a linear model cannot be accurately estimated by the least squares
regression due to multicollinearity generated by such variables. Surprisingly,
an average of these parameters can be extremely accurately estimated. We find
this average and briefly discuss its applications in the least squares
regression.Comment: 1
About hidden influence of predictor variables: Suppressor and mediator variables
In this paper procedure for researching hidden influence of predictor variables in regression models and depicting suppressor variables and mediator variables is shown. It is also shown that detection of suppressor variables and mediator variables could provide refined information about the research problem. As an example for applying this procedure, relation between Atlantic atmospheric centers and air temperature and precipitation amount in Serbia is chosen. [Projekat Ministarstva nauke Republike Srbije, br. 47007
Geometric programming prediction of design trends for OMV protective structures
The global optimization trends of protective honeycomb structural designs for spacecraft subject to hypervelocity meteroid and space debris are presented. This nonlinear problem is first formulated for weight minimization of the orbital maneuvering vehicle (OMV) using a generic monomial predictor. Five problem formulations are considered, each dependent on the selection of independent design variables. Each case is optimized by considering the dual geometric programming problem. The dual variables are solved for in terms of the generic estimated exponents of the monomial predictor. The primal variables are then solved for by conversion. Finally, parametric design trends are developed for ranges of the estimated regression parameters. Results specify nonmonotonic relationships for the optimal first and second sheet mass per unit areas in terms of the estimated exponents
Unbiased split selection for classification trees based on the Gini Index
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion
Variables Predicting the Severity of a Mass Shooting: the connection to white supremacy
Since mass shootings have become increasingly relevant in today’s society, the subject of what makes a mass shooting deadly has become more and more popular. This project focuses on how selected variables correlate with the severity of a mass shooting, and especially focuses on the impact of white supremacy ideology. Theoretically, a shooter imbued with this ideology will likely be more violent, thus causing a higher victim count (injuries + deaths). The other variables included in the model are: the use of a long gun, the use of multiple guns, the use of semi-automatic guns, mental illness, and shooter suicide. This project seeks to assess the relationships of these variables to the victim count, and the statistical significance of each of these relationships. By drawing from two prominent mass-shooting databases and associated media sources, a dataset was constructed, then analyzed with correlation, regression, and ANOVA. These analyses confirmed all of the hypotheses, with predictor variable correlating positively and significantly to victim count. Most importantly, the findings confirmed the significance of the white supremacy ideology variable in predicting the violence of a mass shooting, and the effect withstood the introduction of a variety of important control variables; in short, shooters with a white supremacy background tend to inflict a higher victim count during a mass shooting. Based on these findings, suggestions for further research include separating active-shooter mass shootings from other types of mass shootings; standardizing the operational definition of a mass shooting; and increasing the number of possible predictor variables in current mass shooting databases
Robust rank correlation based screening
Independence screening is a variable selection method that uses a ranking
criterion to select significant variables, particularly for statistical models
with nonpolynomial dimensionality or "large p, small n" paradigms when p can be
as large as an exponential of the sample size n. In this paper we propose a
robust rank correlation screening (RRCS) method to deal with ultra-high
dimensional data. The new procedure is based on the Kendall \tau correlation
coefficient between response and predictor variables rather than the Pearson
correlation of existing methods. The new method has four desirable features
compared with existing independence screening methods. First, the sure
independence screening property can hold only under the existence of a second
order moment of predictor variables, rather than exponential tails or
alikeness, even when the number of predictor variables grows as fast as
exponentially of the sample size. Second, it can be used to deal with
semiparametric models such as transformation regression models and single-index
models under monotonic constraint to the link function without involving
nonparametric estimation even when there are nonparametric functions in the
models. Third, the procedure can be largely used against outliers and influence
points in the observations. Last, the use of indicator functions in rank
correlation screening greatly simplifies the theoretical derivation due to the
boundedness of the resulting statistics, compared with previous studies on
variable screening. Simulations are carried out for comparisons with existing
methods and a real data example is analyzed.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1024 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org). arXiv admin note: text overlap with
arXiv:0903.525
- …
