118 research outputs found

    Исследование новых критериев для обнаружения автокорреляции остатков первого порядка в регрессионных моделях

    Get PDF
    When estimating regression models using the least squares method, one of its prerequisites is the lack of autocorrelation in the regression residuals. The presence of autocorrelation in the residuals makes the least-squares regression estimates to be ineffective, and the standard errors of these estimates to be untenable. Quantitatively, autocorrelation in the residuals of the regression model has traditionally been estimated using the Durbin-Watson statistic, which is the ratio of the sum of the squares of differences of consecutive residual values to the sum of squares of the residuals. Unfortunately, such an analytical form of the Durbin-Watson statistic does not allow it to be integrated, as linear constraints, into the problem of selecting informative regressors, which is, in fact, a mathematical programming problem in the regression model. The task of selecting informative regressors is to extract from the given number of possible regressors a given number of variables based on a certain quality criterion.The aim of the paper is to develop and study new criteria for detecting first-order autocorrelation in the residuals in regression models that can later be integrated into the problem of selecting informative regressors in the form of linear constraints. To do this, the paper proposes modular autocorrelation statistic for which, using the Gretl package, the ranges of their possible values and limit values were first determined experimentally, depending on the value of the selective coefficient of auto-regression. Then the results obtained were proved by model experiments using the Monte Carlo method. The disadvantage of the proposed modular statistic of adequacy is that their dependencies on the selective coefficient of auto-regression are not even functions. For this, double modular autocorrelation criteria are proposed, which, using special methods, can be used as linear constraints in mathematical programming problems to select informative regressors in regression models.При оценивании регрессионных моделей с помощью метода наименьших квадратов, одной из его предпосылок является отсутствие автокорреляции в остатках регрессии. Наличие автокорреляции остатков делает оценки регрессии, полученные методом наименьших квадратов, неэффективными, а стандартные ошибки этих оценок – несостоятельными. Количественно автокорреляцию в остатках регрессионной модели традиционно принято оценивать с помощью критерия Дарбина–Уотсона, представляющего собой отношение суммы квадратов разностей последовательных значений остатков к сумме квадратов остатков. К сожалению, такой аналитический вид критерия Дарбина – Уотсона не позволяет интегрировать его в виде линейных ограничений в задачу отбора информативных регрессоров, являющуюся, по сути, задачей математического программирования, в регрессионной модели. Задача отбора информативных регрессоров заключается в выделении из заданного числа возможных регрессоров заданного числа переменных на основе некоторого критерия качества. Целью данной работы является разработка и исследование новых критериев для обнаружения автокорреляции остатков первого порядка в регрессионных моделях, которые в дальнейшем могут быть интегрированы в задачу отбора информативных регрессоров в виде линейных ограничений. Для этого в статье предложены модульные критерии автокорреляции, для которых с использованием пакета Gretl сначала экспериментально были определены диапазоны их возможных значений и предельные значения в зависимости от значения выборочного коэффициента авторегрессии. Затем полученные результаты были подтверждены с помощью модельных экспериментов по методу Монте-Карло. Недостатком предложенных модульных критериев адекватности является то, что их зависимости от выборочного коэффициента авторегрессии не являются четными функциями. Для этого предлагаются двойные модульные критерии автокорреляции, которые с помощью специальных приёмов могут быть использованы в виде линейных ограничений в задачах математического программирования для отбора информативных регрессоров в регрессионных моделях

    Approaches for Outlier Detection in Sparse High-Dimensional Regression Models

    Get PDF
    Modern regression studies often encompass a very large number of potential predictors, possibly larger than the sample size, and sometimes growing with the sample size itself. This increases the chances that a substantial portion of the predictors is redundant, as well as the risk of data contamination. Tackling these problems is of utmost importance to facilitate scientific discoveries, since model estimates are highly sensitive both to the choice of predictors and to the presence of outliers. In this thesis, we contribute to this area considering the problem of robust model selection in a variety of settings, where outliers may arise both in the response and the predictors. Our proposals simplify model interpretation, guarantee predictive performance, and allow us to study and control the influence of outlying cases on the fit. First, we consider the co-occurrence of multiple mean-shift and variance-inflation outliers in low-dimensional linear models. We rely on robust estimation techniques to identify outliers of each type, exclude mean-shift outliers, and use restricted maximum likelihood estimation to down-weight and accommodate variance-inflation outliers into the model fit. Second, we extend our setting to high-dimensional linear models. We show that mean-shift and variance-inflation outliers can be modeled as additional fixed and random components, respectively, and evaluated independently. Specifically, we perform feature selection and mean-shift outlier detection through a robust class of nonconcave penalization methods, and variance-inflation outlier detection through the penalization of the restricted posterior mode. The resulting approach satisfies a robust oracle property for feature selection in the presence of data contamination – which allows the number of features to exponentially increase with the sample size – and detects truly outlying cases of each type with asymptotic probability one. This provides an optimal trade-off between a high breakdown point and efficiency. Third, focusing on high-dimensional linear models affected by meanshift outliers, we develop a general framework in which L0-constraints coupled with mixed-integer programming techniques are used to perform simultaneous feature selection and outlier detection with provably optimal guarantees. In particular, we provide necessary and sufficient conditions for a robustly strong oracle property, where again the number of features can increase exponentially with the sample size, and prove optimality for parameter estimation and the resulting breakdown point. Finally, we consider generalized linear models and rely on logistic slippage to perform outlier detection and removal in binary classification. Here we use L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem of feature selection and outlier detection, and the framework allows us again to pursue optimality guarantees. For all the proposed approaches, we also provide computationally lean heuristic algorithms, tuning procedures, and diagnostic tools which help to guide the analysis. We consider several real-world applications, including the study of the relationships between childhood obesity and the human microbiome, and of the main drivers of honey bee loss. All methods developed and data used, as well as the source code to replicate our analyses, are publicly available

    Prediction of hierarchical time series using structured regularization and its application to artificial neural networks

    Full text link
    This paper discusses the prediction of hierarchical time series, where each upper-level time series is calculated by summing appropriate lower-level time series. Forecasts for such hierarchical time series should be coherent, meaning that the forecast for an upper-level time series equals the sum of forecasts for corresponding lower-level time series. Previous methods for making coherent forecasts consist of two phases: first computing base (incoherent) forecasts and then reconciling those forecasts based on their inherent hierarchical structure. With the aim of improving time series predictions, we propose a structured regularization method for completing both phases simultaneously. The proposed method is based on a prediction model for bottom-level time series and uses a structured regularization term to incorporate upper-level forecasts into the prediction model. We also develop a backpropagation algorithm specialized for application of our method to artificial neural networks for time series prediction. Experimental results using synthetic and real-world datasets demonstrate the superiority of our method in terms of prediction accuracy and computational efficiency

    Analysis and Modeling of U.S. Army Recruiting Markets

    Get PDF
    The United States Army Recruiting Command (USAREC) is charged with finding, engaging, and ultimately enlisting young Americans for service as Soldiers in the U.S. Army. USAREC must decide how to allocate monthly enlistment goals, by aptitude and education level, across its 38 subordinate recruiting battalions in order to maximize the number of enlistment contracts produced each year. In our research, we model the production of enlistment contracts as a function of recruiting supply and demand factors which vary over the recruiting battalion areas of responsibility. Using county-level data for the period of recruiting year RY2010 through RY2013 mapped to recruiting battalion areas, we find that a set of five variables along with categorical indicators for battalions and quarters of the fiscal year accounts for 70, 74, and 81 of the variation in contract production for high-aptitude high school seniors, high-aptitude high school graduates and all others, respectively. We find indications that high-aptitude seniors and graduates should be modeled as separate entities, contrary to current procedure. Finally, our models perform consistently well against a validation dataset from RY2014, and we ultimately achieve 530, 119, and 170 relative increases in respective correlation coefficients over previous comparable literature

    Leadership Styles and RN Turnover Intentions in Long-Term Care Facilities

    Get PDF
    Employee turnover is a concern for leaders in the nursing home industry because employees with turnover intentions may negatively impact the continuity of operations and strategic plans, resulting in poor quality of care for residents. Grounded in House\u27s path-goal theory, the purpose of this quantitative, correlational study was to examine the relationship among idealized attributes, idealized behaviors, inspirational motivation, intellectual stimulation, individualized consideration, contingent reward, management by exception-active management by exception-passive, and turnover intentions in RNs. The independent variables were the subcategories of transformational and transactional leadership. The dependent variable was turnover intentions. Participants included 110 nonmanagement RNs working in long-term care facilities in Illinois. Data was collected using the Multifactor-Leadership Questionnaire (MLQ-5x short) and the Turnover Intentions Scale (TIS-6). The multiple linear regression analysis results indicated the model was able to significantly predict turnover intentions: F(8,101) = 8.53, p \u3c .001, R²= .40, R²adj = .36. In the final model, three predictors were significant, inspirational motivation (t=-1.87, p=\u3c.010, β=-.323), contingent reward (t=2.15, p=\u3c.015, β=.289), and management by exception passive (t=5.29, p=\u3c.001, β=.387). A key recommendation is for nursing home leaders to encourage development, positive morale, and recognize employees for good performance. The implication for positive social change includes the potential to minimize employee turnover and enhance the quality of healthcare for nursing home patients

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    corecore