158,967 research outputs found

    Generalized Extreme Value Regression for Binary Rare Events Data: an Application to Credit Defaults

    Get PDF
    The most used regression model with binary dependent variable is the logistic regression model. When the dependent variable represents a rare event, the logistic regression model shows relevant drawbacks. In order to overcome these drawbacks we propose the Generalized Extreme Value (GEV) regression model. In particular, in a Generalized Linear Model (GLM) with binary dependent variable we suggest the quantile function of the GEV distribution as link function, so our attention is focused on the tail of the response curve for values close to one. The estimation procedure is the maximum likelihood method. This model accommodates skewness and it presents a generalization of GLMs with log-log link function. In credit risk analysis a pivotal topic is the default probability estimation. Since defaults are rare events, we apply the GEV regression to empirical data on Italian Small and Medium Enterprises (SMEs) to model their default probabilities.

    Analysis of binary spatial data by quasi-likelihood estimating equations

    Full text link
    The goal of this paper is to describe the application of quasi-likelihood estimating equations for spatially correlated binary data. In this paper, a logistic function is used to model the marginal probability of binary responses in terms of parameters of interest. With mild assumptions on the correlations, the Leonov-Shiryaev formula combined with a comparison of characteristic functions can be used to establish asymptotic normality for linear combinations of the binary responses. The consistency and asymptotic normality for quasi-likelihood estimates can then be derived. By modeling spatial correlation with a variogram, we apply these asymptotic results to test independence of two spatially correlated binary outcomes and illustrate the concepts with a well-known example based on data from Lansing Woods. The comparison of generalized estimating equations and the proposed approach is also discussed.Comment: Published at http://dx.doi.org/10.1214/009053605000000057 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric Bayes dynamic modeling of relational data

    Full text link
    Symmetric binary matrices representing relations among entities are commonly collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being in inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the probability matrix space to the latent relational space, we obtain a flexible and computational tractable formulation. Employing P\`olya-Gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide some theoretical results on flexibility of the model, and illustrate performance via simulation experiments. We also consider an application to co-movements in world financial markets

    Prediction by Nonparametric Posterior Estimation in Virtual Screening

    No full text
    The ability to rank molecules according to their effectiveness in some domain, e.g. pesticide, drug, is important owing to the cost of synthesising and testing chemical compounds. Virtual screening seeks to do this computationally with potential savings of millions of pounds and large profits associated with reduced time to market. Recently, binary kernel discrimination (BKD) is introduced and becoming popular in Chemoinformatics domain. It produces scores based on the estimated likelihood ratio of active to inactive compounds that are then ranked. The likelihoods are estimated through a Parzen Windows approach using the binomial distribution function (to accommodate binary descriptor or "fingerprint" vectors representing the presence, or not, of certain sub-structural arrangements of atoms) in place of the usual Gaussian choice. This research aims to compute the likelihood ratio via direct estimate of posterior probability by using non-parametric generalisation of logistic regression the so-called ā€œKernel Logistic Regressionā€. Furthermore, complexity is then controlled by penalising the likelihood function by Lq-norm. The compounds are then rank descending on the basis of posterior probability. The 11 activity classes from the MDL Drug Data Report (MDDR) database are used. The results are found to be less accurate than a currently leading approach but are still comparable in a number of cases

    Logistic regression Model Effectiveness: Proportional Chance Criteria and Proportional Reduction in Error

    Get PDF
    The importance of classification tables in binary logistic regression analysis has not been fully recognized. This may be due to an over reliance on statistical software or lack of awareness of the value that computation of the proportional by chance accuracy criteria (PCC) and proportional reduction in error (PRE) statistic can add to binary logistic regression models. Case illustrations are used in this paper to demonstrate the usefulness of these computations. An overview of logistic regression is proffered along with a discussion of the function of case classifications and strategies in application of the PCC and PRE. It offers guidance for others interested in understanding how classification tables can be maximized to assess the predictive effectiveness and utility of binary logistic regression models

    A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome

    Get PDF
    Unobserved confounding is a well known threat to causal inference in non-experimental studies. The instrumental variable design can under certain conditions be used to recover an unbiased estimator of a treatment effect even if unobserved confounding cannot be ruled out with certainty. For continuous outcomes, two stage least squares is the most common instrumental variable estimator used in epidemiologic applications. For a rare binary outcome, an analogous linear-logistic two-stage procedure can be used. Alternatively, a control function approach is sometimes used which entails entering the residual from the first stage linear model as a covariate in a second stage logistic regression of the outcome on the treatment. Both strategies for binary response have previously formally been justified only for continuous exposure, which has impeded widespread use of the approach outside of this setting. In this note, we consider the important setting of binary exposure in the context of a binary outcome. We provide an alternative motivation for the control function approach which is appropriate for binary exposure, thus establishing simple conditions under which the approach may be used for instrumental variable estimation when the outcome is rare. In the proposed approach, the first stage regression involves a logistic model of the exposure conditional on the instrumental variable, and the second stage regression is a logistic regression of the outcome on the exposure adjusting for the first stage residual. In the event of a non-rare outcome, we recommend replacing the second stage logistic model with a risk ratio regression

    Perbandingan Metode Klasifikasi Regresi Logistik Biner Dan Radial Basis Function Network Pada Berat Bayi Lahir Rendah (Studi Kasus: Puskesmas Pamenang Kota Jambi)

    Full text link
    Low Birth Weight (LBW) is one of the main causes of infant mortality. LBW must be identified and predicted before the baby birth by observing historical data of expectant. This research aims to analyze the classification of status newborn in order to reduce the risk of LBW. The statistical method used are the Binary Logistic Regression and Radial Basis Function Network. The data used in this final project is birth weight at Pamenang Jambi City health center in 2014. In this research, the data are divided into training data and testing data. Training data will be used to generate the model and pattern formation, while testing the data is used to measure how the accuracy of the representative model or pattern formed in classifying data through confusion tables. The results of analysis showed that the Binary Logistic Regression method gives 81,7% of classification accuracy for training data and 77,4% of classification accuracy for testing data, while Radial Basis Function Network method gives 92,96% of classification accuracy for training data and 80,64% of classification accuracy for testing data. Radial Basis Function Network method has better classification accuracy than the Binary Logistic Regression method
    • ā€¦
    corecore