120,607 research outputs found

    Robust Diagnostics In Logistic Regression Model

    Get PDF
    In recent years, due to inconsistency and sensitivity of the Maximum Likelihood Estimator (MLE) in the presence of high leverage points and residual outliers, diagnostic has become an essential part of logistic regression model. High leverage points and residual outliers have huge tendency to break the covariate pattern resulting in biased parameter estimates. The identification of high leverage points and residual outliers are believed to be vital in order to improve the performance of the MLE. The presence of high leverage points and the residual outliers give adverse effect on the inferences by inducing large values to the Influence Function (IF). For the identification of high leverage points, Imon (2006) proposed the Distance from the Mean (DM) diagnostic method. The weakness of the DM method is that it tends to swamp some low leverage points even though it can identify the high leverage points correctly. Deleting the low leverage points may lead to a loss of efficiency and precision of the parameter estimates. The Robust Logistic Diagnostic (RLGD) is proposed as an alternative approach that performs well compared to the DM method. The RLGD method incorporates robust approaches and diagnostic procedures. Robust approach is firstly used to identify suspected high leverage points by computing the Robust Mahalanobis Distance (RMD) based on Minimum Volume Ellipsoid (MVE) estimator or Minimum Covariance Determinant (MCD) estimator. For confirmation, the diagnostic procedure is used to compute potential. The RLGD method ensures only correct high leverage points are identified and free from the swamping and masking effects. The performance of the RLGD method is investigated by real examples and the Monte Carlo simulation study. The real examples and the simulation results indicate that the RLGD method correctly identify the high leverage points (increase the probability of the Detection of Capability (DC)) and manage to reduce the number of swamping low leverage points (decrease the probability of the False Alarm Rate (FAR)). The Standardized Pearson Residual (SPR) only successful in identifying a single residual outlier. The SPR method is less effective when residual outliers are present in the covariates. The Generalized Standardized Pearson Residual (GSPR) proposed by Imon and Hadi (2008) is a successful method in identifying residual outliers. However, in the initial stage of the GSPR method utilizes the graphical methods which are based on the observation’s judgement and not suitable for higher dimensional covariates. The Modified Standardized Pearson Residual (MSPR) based on the RLGD method is proposed which is more reliable. The MSPR method provides an alternative method to the GSPR method that produces similar result. The attractive feature of the MSPR method is that it is easier to apply. This research also utilizes the RLGD method in bootstrap procedures. The Classical Bootstrap (CB) procedure by Random-x Re-sampling is not robust to the high leverage points. To accommodate this problem, the newly develop bootstrap procedures based on the RLGD method which are called the Diagnostic Logistic Before Bootstrap (DLGBB) and the Weighted Logistic Bootstrap with Probability (WLGBP) are proposed. In the DLGBB procedure, the high leverage points are excluded before applying the re-sampling process. Meanwhile in the WLGBP procedure, the high leverage points are attributed with low probabilities and consequently having low chances of being selected in the re-sampling process. Simulation results show that the DLGBB and the WLGBP procedures are more robust to the high leverage points compared to the CB procedure

    A logistic regression model for microalbuminuria prediction in overweight male population

    Get PDF
    Background: Obesity promotes progression to microalbuminuria and increases the risk of chronic kidney disease. Current protocols of screening microalbuminuria are not recommended for the overweight or obese.

Design and Methods: A cross-sectional study was conducted. The relationship between metabolic risk factors and microalbuminuria was investigated. A regression model based on metabolic risk factors was developed and evaluated for predicting microalbuminuria in the overweight or obese.

Results: The prevalence of MA reached up to 17.6% in Chinese overweight men. Obesity, hypertension, hyperglycemia and hyperuricemia were the important risk factors for microalbuminuria in the overweight. The area under ROC curves of the regression model based on the risk factors was 0.82 in predicting microalbuminuria, meanwhile, a decision threshold of 0.2 was found for predicting microalbuminuria with a sensitivity of 67.4% and specificity of 79.0%, and a global predictive value of 75.7%. A decision threshold of 0.1 was chosen for screening microalbuminuria with a sensitivity of 90.0% and specificity of 56.5%, and a global predictive value of 61.7%.

Conclusions: The prediction model was an effective tool for screening microalbuminuria by using routine data among overweight populations

    Collinearity diagnostics of binary logistic regression model

    Get PDF
    Multicollinearity is a statistical phenomenon in which predictor variables in a logistic regression model are highly correlated. It is not uncommon when there are a large number of covariates in the model. Multicollinearity has been the thousand pounds monster in statistical modeling. Taming this monster has proven to be one of the great challenges of statistical modeling research. Multicollinearity can cause unstable estimates and inaccurate variances which affects confidence intervals and hypothesis tests. The existence of collinearity inflates the variances of the parameter estimates, and consequently incorrect inferences about relationships between explanatory and response variables. Examining the correlation matrix may be helpful to detect multicollinearity but not sufficient. Much better diagnostics are produced by linear regressionwith the option tolerance, Vif, condition indices and variance proportions. For moderate to large sample sizes, the approach to drop one of the correlated variables was established entirely satisfactory to reduce multicollinearity. On the light of different collinearity diagnostics, we may safely conclude that without increasing sample size, the second choice to omit one of the correlated variables can reduce multicollinearity to a great extent

    A Fused Elastic Net Logistic Regression Model for Multi-Task Binary Classification

    Full text link
    Multi-task learning has shown to significantly enhance the performance of multiple related learning tasks in a variety of situations. We present the fused logistic regression, a sparse multi-task learning approach for binary classification. Specifically, we introduce sparsity inducing penalties over parameter differences of related logistic regression models to encode similarity across related tasks. The resulting joint learning task is cast into a form that lends itself to be efficiently optimized with a recursive variant of the alternating direction method of multipliers. We show results on synthetic data and describe the regime of settings where our multi-task approach achieves significant improvements over the single task learning approach and discuss the implications on applying the fused logistic regression in different real world settings.Comment: 17 page

    Obtaining adjusted prevalence ratios from logistic regression model in cross-sectional studies

    Full text link
    In the last decades, it has been discussed the use of epidemiological prevalence ratio (PR) rather than odds ratio as a measure of association to be estimated in cross-sectional studies. The main difficulties in use of statistical models for the calculation of PR are convergence problems, availability of adequate tools and strong assumptions. The goal of this study is to illustrate how to estimate PR and its confidence interval directly from logistic regression estimates. We present three examples and compare the adjusted estimates of PR with the estimates obtained by use of log-binomial, robust Poisson regression and adjusted prevalence odds ratio (POR). The marginal and conditional prevalence ratios estimated from logistic regression showed the following advantages: no numerical instability; simple to implement in a statistical software; and assumes the adequate probability distribution for the outcome

    Geographically weighted temporally correlated logistic regression model.

    Get PDF
    Detecting the temporally and spatially varying correlations is important to understand the biological and disease systems. Here we proposed a geographically weighted temporally correlated logistic regression (GWTCLR) model to identify such dynamic correlation of predictors on binomial outcome data, by incorporating spatial and temporal information for joint inference. The local likelihood method is adopted to estimate the spatial relationship, while the smoothing method is employed to estimate the temporal variation. We present the construction and implementation of GWTCLR and the study of the asymptotic properties of the proposed estimator. Simulation studies were conducted to evaluate the robustness of the proposed model. GWTCLR was applied on real epidemiologic data to study the climatic determinants of human seasonal influenza epidemics. Our method obtained results largely consistent with previous studies but also revealed certain spatial and temporal varying patterns that were unobservable by previous models and methods

    Using the Analysis of Logistic Regression Model in Auditing and Detection of Frauds

    Get PDF
    Fraud is defined as intentional actions in which one or more people, including from the management, employees, or the third parties, venture to obtain an unjust or illegal benefit. According to the researches, the average cost of fraud was determined as 5% of total incomes. The fraud, which has the results like a financial iceberg besides the direct losses, causes damages like loss of reputation, and adverse effects of customer relations. Auditing and detection of fraud, which has such vast effects, is of great importance. In this study, we have developed a model that is designed for detecting mistreatments with logistic regression and the abuses in the performance-based salary system in the health sector. For this, some imaginary surgery data were added into the actual data of laparoscopic cholecystectomy operations performed in a public hospital in 2015, and to distinguish this fictitious data, the success of the generated logistic regression model was tested. Consequently, it shows that the model had 83.30% of the success rate for detecting the false data added to real data
    corecore