7 research outputs found
Variable selection in gamma regression model using chaotic firefly algorithm with application in chemometrics
Variable selection is a very helpful procedure for improving computational speed and prediction accuracy by identifying the most important variables that related to the response variable. Regression modeling has received much attention in several science fields. Firefly algorithm is one of the recently efficient proposed nature-inspired algorithms that can efficiently be employed for variable selection. In this work, chaotic firefly algorithm is proposed to perform variable selection for gamma regression model. A real data application related to the chemometrics is conducted to evaluate the performance of the proposed method in terms of prediction accuracy and variable selection criteria. Further, its performance is compared with other methods. The results proved the efficiency of our proposed methods and it outperforms other popular methods
Specification of Mixed Logit Models Using an Optimization Approach
Mixed logit models are a widely-used tool for studying discrete outcome problems. Modeling development entails answering three important questions that highly affect the quality of the specification: (i) what variables are considered in the analysis? (ii) what are going to be the coefficients for these variables? and (iii) what density function these coefficients will follow? The literature provides guidance; however, a strong statistical background and an ad hoc search process are required to obtain the best model specification. Knowledge of the problem context and data is required. Given a dataset including discrete outcomes and associated characteristics the problem to be addressed in this thesis is to investigate to what extend a relatively simple metaheuristic such as Simulated Annealing, can determine the best model specification for a mixed logit model and answer the above questions. A mathematical programing formulation is proposed and simulated annealing is implemented to find solutions for the proposed formulation. Three experiments were performed to test the effectiveness of the proposed algorithm. A comparison with existing model specifications for the same datasets was performed. The results suggest that the proposed algorithm is able to find an adequate model specification in terms of goodness of fit thereby reducing involvement of the analyst
Assisted specification of discrete choice models
Determining appropriate utility specifications for discrete choice models is time-consuming and prone to errors. With the availability of larger and larger datasets, as the number of possible specifications exponentially grows with the number of variables under consideration, the analysts need to spend increasing amounts of time on searching for good models through trial-and-error, while expert knowledge is required to ensure these models are sound. This paper proposes an algorithm that aims at assisting modelers in their search. Our approach translates the task into a multi-objective combinatorial optimization problem and makes use of a variant of the variable neighborhood search algorithm to generate sets of promising model specifications. We apply the algorithm both to semi-synthetic data and to real mode choice datasets as a proof of concept. The results demonstrate its ability to provide relevant insights in reasonable amounts of time so as to effectively assist the modeler in developing interpretable and powerful models
Design and performance evaluation of failure prediction models
Prediction of corporate bankruptcy (or distress) is one of the major activities in auditing
firms’ risks and uncertainties. The design of reliable models to predict distress is crucial
for many decision-making processes. Although a variety of models have been designed to
predict distress, the relative performance evaluation of competing prediction models
remains an exercise that is unidimensional in nature. To be more specific, although some
studies use several performance criteria and their measures to assess the relative
performance of distress prediction models, the assessment exercise of competing
prediction models is restricted to their ranking by a single measure of a single criterion at
a time, which leads to reporting conflicting results. The first essay of this research
overcomes this methodological issue by proposing an orientation-free super-efficiency
Data Envelopment Analysis (DEA) model as a multi-criteria assessment framework.
Furthermore, the study performs an exhaustive comparative analysis of the most popular
bankruptcy modelling frameworks for UK data. Also, it addresses two important research
questions; namely, do some modelling frameworks perform better than others by design?
and to what extent the choice and/or the design of explanatory variables and their nature
affect the performance of modelling frameworks? Further, using different static and
dynamic statistical frameworks, this chapter proposes new Failure Prediction Models
(FPMs).
However, within a super-efficiency DEA framework, the reference benchmark changes
from one prediction model evaluation to another one, which in some contexts might be
viewed as “unfair” benchmarking. The second essay overcomes this issue by proposing a
Slacks-Based Measure Context-Dependent DEA (SBM-CDEA) framework to evaluate
the competing Distress Prediction Models (DPMs). Moreover, it performs an exhaustive
comparative analysis of the most popular corporate distress prediction frameworks under
both a single criterion and multiple criteria using data of UK firms listed on London Stock
Exchange (LSE). Further, this chapter proposes new DPMs using different static and
dynamic statistical frameworks.
Another shortcoming of the existing studies on performance evaluation lies in the use of
static frameworks to compare the performance of DPMs. The third essay overcomes this
methodological issue by suggesting a dynamic multi-criteria performance assessment
framework, namely, Malmquist SBM-DEA, which by design, can monitor the
performance of competing prediction models over time. Further, this study proposes new
static and dynamic distress prediction models. Also, the study addresses several research
questions as follows; what is the effect of information on the performance of DPMs? How
the out-of-sample performance of dynamic DPMs compares to the out-of-sample
performance of static ones? What is the effect of the length of training sample on the
performance of static and dynamic models? Which models perform better in forecasting
distress during the years with Higher Distress Rate (HDR)?
On feature selection, studies have used different types of information including
accounting, market, macroeconomic variables and the management efficiency scores as
predictors. The recently applied techniques to take into account the management
efficiency of firms are two-stage models. The two-stage DPMs incorporate multiple inputs
and outputs to estimate the efficiency measure of a corporation relative to the most
efficient ones, in the first stage, and use the efficiency score as a predictor in the second
stage. The survey of the literature reveals that most of the existing studies failed to have a
comprehensive comparison between two-stage DPMs. Moreover, the choice of inputs and
outputs for DEA models that estimate the efficiency measures of a company has been
restricted to accounting variables and features of the company. The fourth essay adds to
the current literature of two-stage DPMs in several respects. First, the study proposes to
consider the decomposition of Slack-Based Measure (SBM) of efficiency into Pure
Technical Efficiency (PTE), Scale Efficiency (SE), and Mix Efficiency (ME), to analyse
how each of these measures individually contributes to developing distress prediction
models. Second, in addition to the conventional approach of using accounting variables
as inputs and outputs of DEA models to estimate the measure of management efficiency,
this study uses market information variables to calculate the measure of the market
efficiency of companies. Third, this research provides a comprehensive analysis of two-stage
DPMs through applying different DEA models at the first stage – e.g., input-oriented
vs. output oriented, radial vs. non-radial, static vs. dynamic, to compute the measures of
management efficiency and market efficiency of companies; and also using dynamic and
static classifier frameworks at the second stage to design new distress prediction models