7 research outputs found

    Variable selection in gamma regression model using chaotic firefly algorithm with application in chemometrics

    Get PDF
    Variable selection is a very helpful procedure for improving computational speed and prediction accuracy by identifying the most important variables that related to the response variable. Regression modeling has received much attention in several science fields. Firefly algorithm is one of the recently efficient proposed nature-inspired algorithms that can efficiently be employed for variable selection. In this work, chaotic firefly algorithm is proposed to perform variable selection for gamma regression model.  A real data application related to the chemometrics is conducted to evaluate the performance of the proposed method in terms of prediction accuracy and variable selection criteria. Further, its performance is compared with other methods. The results proved the efficiency of our proposed methods and it outperforms other popular methods

    Specification of Mixed Logit Models Using an Optimization Approach

    Full text link
    Mixed logit models are a widely-used tool for studying discrete outcome problems. Modeling development entails answering three important questions that highly affect the quality of the specification: (i) what variables are considered in the analysis? (ii) what are going to be the coefficients for these variables? and (iii) what density function these coefficients will follow? The literature provides guidance; however, a strong statistical background and an ad hoc search process are required to obtain the best model specification. Knowledge of the problem context and data is required. Given a dataset including discrete outcomes and associated characteristics the problem to be addressed in this thesis is to investigate to what extend a relatively simple metaheuristic such as Simulated Annealing, can determine the best model specification for a mixed logit model and answer the above questions. A mathematical programing formulation is proposed and simulated annealing is implemented to find solutions for the proposed formulation. Three experiments were performed to test the effectiveness of the proposed algorithm. A comparison with existing model specifications for the same datasets was performed. The results suggest that the proposed algorithm is able to find an adequate model specification in terms of goodness of fit thereby reducing involvement of the analyst

    Feature Subset Selection for Logistic Regression via Mixed Integer Optimization

    Get PDF

    Assisted specification of discrete choice models

    Get PDF
    Determining appropriate utility specifications for discrete choice models is time-consuming and prone to errors. With the availability of larger and larger datasets, as the number of possible specifications exponentially grows with the number of variables under consideration, the analysts need to spend increasing amounts of time on searching for good models through trial-and-error, while expert knowledge is required to ensure these models are sound. This paper proposes an algorithm that aims at assisting modelers in their search. Our approach translates the task into a multi-objective combinatorial optimization problem and makes use of a variant of the variable neighborhood search algorithm to generate sets of promising model specifications. We apply the algorithm both to semi-synthetic data and to real mode choice datasets as a proof of concept. The results demonstrate its ability to provide relevant insights in reasonable amounts of time so as to effectively assist the modeler in developing interpretable and powerful models

    Design and performance evaluation of failure prediction models

    Get PDF
    Prediction of corporate bankruptcy (or distress) is one of the major activities in auditing firms’ risks and uncertainties. The design of reliable models to predict distress is crucial for many decision-making processes. Although a variety of models have been designed to predict distress, the relative performance evaluation of competing prediction models remains an exercise that is unidimensional in nature. To be more specific, although some studies use several performance criteria and their measures to assess the relative performance of distress prediction models, the assessment exercise of competing prediction models is restricted to their ranking by a single measure of a single criterion at a time, which leads to reporting conflicting results. The first essay of this research overcomes this methodological issue by proposing an orientation-free super-efficiency Data Envelopment Analysis (DEA) model as a multi-criteria assessment framework. Furthermore, the study performs an exhaustive comparative analysis of the most popular bankruptcy modelling frameworks for UK data. Also, it addresses two important research questions; namely, do some modelling frameworks perform better than others by design? and to what extent the choice and/or the design of explanatory variables and their nature affect the performance of modelling frameworks? Further, using different static and dynamic statistical frameworks, this chapter proposes new Failure Prediction Models (FPMs). However, within a super-efficiency DEA framework, the reference benchmark changes from one prediction model evaluation to another one, which in some contexts might be viewed as “unfair” benchmarking. The second essay overcomes this issue by proposing a Slacks-Based Measure Context-Dependent DEA (SBM-CDEA) framework to evaluate the competing Distress Prediction Models (DPMs). Moreover, it performs an exhaustive comparative analysis of the most popular corporate distress prediction frameworks under both a single criterion and multiple criteria using data of UK firms listed on London Stock Exchange (LSE). Further, this chapter proposes new DPMs using different static and dynamic statistical frameworks. Another shortcoming of the existing studies on performance evaluation lies in the use of static frameworks to compare the performance of DPMs. The third essay overcomes this methodological issue by suggesting a dynamic multi-criteria performance assessment framework, namely, Malmquist SBM-DEA, which by design, can monitor the performance of competing prediction models over time. Further, this study proposes new static and dynamic distress prediction models. Also, the study addresses several research questions as follows; what is the effect of information on the performance of DPMs? How the out-of-sample performance of dynamic DPMs compares to the out-of-sample performance of static ones? What is the effect of the length of training sample on the performance of static and dynamic models? Which models perform better in forecasting distress during the years with Higher Distress Rate (HDR)? On feature selection, studies have used different types of information including accounting, market, macroeconomic variables and the management efficiency scores as predictors. The recently applied techniques to take into account the management efficiency of firms are two-stage models. The two-stage DPMs incorporate multiple inputs and outputs to estimate the efficiency measure of a corporation relative to the most efficient ones, in the first stage, and use the efficiency score as a predictor in the second stage. The survey of the literature reveals that most of the existing studies failed to have a comprehensive comparison between two-stage DPMs. Moreover, the choice of inputs and outputs for DEA models that estimate the efficiency measures of a company has been restricted to accounting variables and features of the company. The fourth essay adds to the current literature of two-stage DPMs in several respects. First, the study proposes to consider the decomposition of Slack-Based Measure (SBM) of efficiency into Pure Technical Efficiency (PTE), Scale Efficiency (SE), and Mix Efficiency (ME), to analyse how each of these measures individually contributes to developing distress prediction models. Second, in addition to the conventional approach of using accounting variables as inputs and outputs of DEA models to estimate the measure of management efficiency, this study uses market information variables to calculate the measure of the market efficiency of companies. Third, this research provides a comprehensive analysis of two-stage DPMs through applying different DEA models at the first stage – e.g., input-oriented vs. output oriented, radial vs. non-radial, static vs. dynamic, to compute the measures of management efficiency and market efficiency of companies; and also using dynamic and static classifier frameworks at the second stage to design new distress prediction models
    corecore