120,607 research outputs found
Robust Diagnostics In Logistic Regression Model
In recent years, due to inconsistency and sensitivity of the Maximum Likelihood
Estimator (MLE) in the presence of high leverage points and residual outliers,
diagnostic has become an essential part of logistic regression model. High
leverage points and residual outliers have huge tendency to break the covariate
pattern resulting in biased parameter estimates. The identification of high
leverage points and residual outliers are believed to be vital in order to improve
the performance of the MLE.
The presence of high leverage points and the residual outliers give adverse effect
on the inferences by inducing large values to the Influence Function (IF). For the
identification of high leverage points, Imon (2006) proposed the Distance from
the Mean (DM) diagnostic method. The weakness of the DM method is that it
tends to swamp some low leverage points even though it can identify the high leverage points correctly. Deleting the low leverage points may lead to a loss of
efficiency and precision of the parameter estimates.
The Robust Logistic Diagnostic (RLGD) is proposed as an alternative approach
that performs well compared to the DM method. The RLGD method
incorporates robust approaches and diagnostic procedures. Robust approach is
firstly used to identify suspected high leverage points by computing the Robust
Mahalanobis Distance (RMD) based on Minimum Volume Ellipsoid (MVE)
estimator or Minimum Covariance Determinant (MCD) estimator. For
confirmation, the diagnostic procedure is used to compute potential. The RLGD
method ensures only correct high leverage points are identified and free from the
swamping and masking effects. The performance of the RLGD method is
investigated by real examples and the Monte Carlo simulation study. The real
examples and the simulation results indicate that the RLGD method correctly
identify the high leverage points (increase the probability of the Detection of
Capability (DC)) and manage to reduce the number of swamping low leverage
points (decrease the probability of the False Alarm Rate (FAR)).
The Standardized Pearson Residual (SPR) only successful in identifying a single
residual outlier. The SPR method is less effective when residual outliers are
present in the covariates. The Generalized Standardized Pearson Residual
(GSPR) proposed by Imon and Hadi (2008) is a successful method in identifying
residual outliers. However, in the initial stage of the GSPR method utilizes the
graphical methods which are based on the observation’s judgement and not suitable for higher dimensional covariates. The Modified Standardized Pearson
Residual (MSPR) based on the RLGD method is proposed which is more
reliable. The MSPR method provides an alternative method to the GSPR method
that produces similar result. The attractive feature of the MSPR method is that it
is easier to apply.
This research also utilizes the RLGD method in bootstrap procedures. The
Classical Bootstrap (CB) procedure by Random-x Re-sampling is not robust to
the high leverage points. To accommodate this problem, the newly develop
bootstrap procedures based on the RLGD method which are called the
Diagnostic Logistic Before Bootstrap (DLGBB) and the Weighted Logistic
Bootstrap with Probability (WLGBP) are proposed. In the DLGBB procedure,
the high leverage points are excluded before applying the re-sampling process.
Meanwhile in the WLGBP procedure, the high leverage points are attributed
with low probabilities and consequently having low chances of being selected in
the re-sampling process. Simulation results show that the DLGBB and the
WLGBP procedures are more robust to the high leverage points compared to the
CB procedure
A logistic regression model for microalbuminuria prediction in overweight male population
Background: Obesity promotes progression to microalbuminuria and increases the risk of chronic kidney disease. Current protocols of screening microalbuminuria are not recommended for the overweight or obese.

Design and Methods: A cross-sectional study was conducted. The relationship between metabolic risk factors and microalbuminuria was investigated. A regression model based on metabolic risk factors was developed and evaluated for predicting microalbuminuria in the overweight or obese.

Results: The prevalence of MA reached up to 17.6% in Chinese overweight men. Obesity, hypertension, hyperglycemia and hyperuricemia were the important risk factors for microalbuminuria in the overweight. The area under ROC curves of the regression model based on the risk factors was 0.82 in predicting microalbuminuria, meanwhile, a decision threshold of 0.2 was found for predicting microalbuminuria with a sensitivity of 67.4% and specificity of 79.0%, and a global predictive value of 75.7%. A decision threshold of 0.1 was chosen for screening microalbuminuria with a sensitivity of 90.0% and specificity of 56.5%, and a global predictive value of 61.7%.

Conclusions: The prediction model was an effective tool for screening microalbuminuria by using routine data among overweight populations
Collinearity diagnostics of binary logistic regression model
Multicollinearity is a statistical phenomenon in which predictor variables in a logistic regression model are highly correlated. It is not uncommon when there are a large number of covariates in the model. Multicollinearity has been the thousand pounds monster in statistical modeling. Taming this monster has proven to be one of the great challenges of statistical modeling research. Multicollinearity can cause unstable estimates and inaccurate variances which affects confidence intervals and hypothesis tests. The existence of collinearity inflates the variances of the parameter estimates, and consequently incorrect inferences about relationships between explanatory and response variables. Examining the correlation matrix may be helpful to detect multicollinearity but not sufficient. Much better diagnostics are produced by linear regressionwith the option tolerance, Vif, condition indices and variance proportions. For moderate to large sample sizes, the approach to drop one of the correlated variables was established entirely satisfactory to reduce multicollinearity. On the light of different collinearity diagnostics, we may safely conclude that without increasing sample size, the second choice to omit one of the correlated variables can reduce multicollinearity to a great extent
A Fused Elastic Net Logistic Regression Model for Multi-Task Binary Classification
Multi-task learning has shown to significantly enhance the performance of
multiple related learning tasks in a variety of situations. We present the
fused logistic regression, a sparse multi-task learning approach for binary
classification. Specifically, we introduce sparsity inducing penalties over
parameter differences of related logistic regression models to encode
similarity across related tasks. The resulting joint learning task is cast into
a form that lends itself to be efficiently optimized with a recursive variant
of the alternating direction method of multipliers. We show results on
synthetic data and describe the regime of settings where our multi-task
approach achieves significant improvements over the single task learning
approach and discuss the implications on applying the fused logistic regression
in different real world settings.Comment: 17 page
Obtaining adjusted prevalence ratios from logistic regression model in cross-sectional studies
In the last decades, it has been discussed the use of epidemiological
prevalence ratio (PR) rather than odds ratio as a measure of association to be
estimated in cross-sectional studies. The main difficulties in use of
statistical models for the calculation of PR are convergence problems,
availability of adequate tools and strong assumptions. The goal of this study
is to illustrate how to estimate PR and its confidence interval directly from
logistic regression estimates. We present three examples and compare the
adjusted estimates of PR with the estimates obtained by use of log-binomial,
robust Poisson regression and adjusted prevalence odds ratio (POR). The
marginal and conditional prevalence ratios estimated from logistic regression
showed the following advantages: no numerical instability; simple to implement
in a statistical software; and assumes the adequate probability distribution
for the outcome
Geographically weighted temporally correlated logistic regression model.
Detecting the temporally and spatially varying correlations is important to understand the biological and disease systems. Here we proposed a geographically weighted temporally correlated logistic regression (GWTCLR) model to identify such dynamic correlation of predictors on binomial outcome data, by incorporating spatial and temporal information for joint inference. The local likelihood method is adopted to estimate the spatial relationship, while the smoothing method is employed to estimate the temporal variation. We present the construction and implementation of GWTCLR and the study of the asymptotic properties of the proposed estimator. Simulation studies were conducted to evaluate the robustness of the proposed model. GWTCLR was applied on real epidemiologic data to study the climatic determinants of human seasonal influenza epidemics. Our method obtained results largely consistent with previous studies but also revealed certain spatial and temporal varying patterns that were unobservable by previous models and methods
Using the Analysis of Logistic Regression Model in Auditing and Detection of Frauds
Fraud is defined as intentional actions in which one or more people, including from the
management, employees, or the third parties, venture to obtain an unjust or illegal benefit.
According to the researches, the average cost of fraud was determined as 5% of total
incomes. The fraud, which has the results like a financial iceberg besides the direct losses,
causes damages like loss of reputation, and adverse effects of customer relations. Auditing
and detection of fraud, which has such vast effects, is of great importance.
In this study, we have developed a model that is designed for detecting mistreatments with
logistic regression and the abuses in the performance-based salary system in the health
sector. For this, some imaginary surgery data were added into the actual data of
laparoscopic cholecystectomy operations performed in a public hospital in 2015, and to
distinguish this fictitious data, the success of the generated logistic regression model was
tested. Consequently, it shows that the model had 83.30% of the success rate for detecting
the false data added to real data
- …