115 research outputs found

    Building predictive models for feature selection in genomic mining

    Full text link
    Building predictive models for genomic mining requires feature selection, as an essential preliminary step to reduce the large number of variable available. Feature selection is a process to select a subset of features which is the most essential for the intended tasks such as classification, clustering or regression analysis. In gene expression microarray data, being able to select a few genes not only makes data analysis efficient but also helps their biological interpretation. Microarray data has typically several thousands of genes (features) but only tens of samples. Problems which can occur due to the small sample size have not been addressed well in the literature. Our aim is to discuss some issues on feature selection in microarray data in order to select the most predictive genes. We compare classical approaches based on statistical tests with a new approach based on marker selection. Finally, we compare the best predictive model with a model derived from a boosting method

    Bayesian feature selection to estimate customer survival

    Full text link
    We consider the problem of estimating the lifetime value of customers, when a large number of features are present in the data. In order to measure lifetime value we use survival analysis models to estimate customer tenure. In such a context, a number of classical modelling challenges arise. We will show how our proposed Bayesian methods perform, and compare it with classical churn models on a real case study. More specifically, based on data from a media service company, our aim will be to predict churn behaviour, in order to entertain appropriate retention actions

    A comparative analysis of the UK and Italian small businesses using Generalised Extreme Value models

    Get PDF
    This paper presents a cross-country comparison of significant predictors of small business failure between Italy and the UK. Financial measures of profitability, leverage, coverage, liquidity, scale and non-financial information are explored, some commonalities and differences are highlighted. Several models are considered, starting with the logistic regression which is a standard approach in credit risk modelling. Some important improvements are investigated. Generalised Extreme Value (GEV) regression is applied in contrast to the logistic regression in order to produce more conservative estimates of default probability. The assumption of non-linearity is relaxed through application of BGEVA, non-parametric additive model based on the GEV link function. Two methods of handling missing values are compared: multiple imputation and Weights of Evidence (WoE) transformation. The results suggest that the best predictive performance is obtained by BGEVA, thus implying the necessity of taking into account the low volume of defaults and non-linear patterns when modelling SME performance. WoE for the majority of models considered show better prediction as compared to multiple imputation, suggesting that missing values could be informative
    • …
    corecore