5 research outputs found
Robust DEA efficiency scores: A probabilistic/combinatorial approach
In this paper we propose robust efficiency scores for the scenario in which
the specification of the inputs/outputs to be included in the DEA model is
modelled with a probability distribution. This proba- bilistic approach allows
us to obtain three different robust efficiency scores: the Conditional Expected
Score, the Unconditional Expected Score and the Expected score under the
assumption of Maximum Entropy principle. The calculation of the three
efficiency scores involves the resolution of an exponential number of linear
problems. The algorithm presented in this paper allows to solve over 200
millions of linear problems in an affordable time when considering up 20
inputs/outputs and 200 DMUs. The approach proposed is illustrated with an
application to the assessment of professional tennis players
An Alternative Approach to Reduce Dimensionality in Data Envelopment Analysis
Principal component analysis reduces dimensionality; however, uncorrelated components imply the existence of variables with weights of opposite signs. This complicates the application in data envelopment analysis. To overcome problems due to signs, a modification to the component axes is proposed and was verified using Monte Carlo simulations
New models and methods for classification and feature selection. a mathematical optimization perspective
The objective of this PhD dissertation is the development of new models for Supervised
Classification and Benchmarking, making use of Mathematical Optimization and Statistical
tools. Particularly, we address the fusion of instruments from both disciplines,
with the aim of extracting knowledge from data. In such a way, we obtain innovative
methodologies that overcome to those existing ones, bridging theoretical Mathematics
with real-life problems.
The developed works along this thesis have focused on two fundamental methodologies
in Data Science: support vector machines (SVM) and Benchmarking. Regarding
the first one, the SVM classifier is based on the search for the separating hyperplane of
maximum margin and it is written as a quadratic convex problem. In the Benchmarking
context, the goal is to calculate the different efficiencies through a non-parametric
deterministic approach. In this thesis we will focus on Data Envelopment Analysis
(DEA), which consists on a Linear Programming formulation.
This dissertation is structured as follows. In Chapter 1 we briefly present the
different challenges this thesis faces on, as well as their state-of-the-art. In the same
vein, the different formulations used as base models are exposed, together with the
notation used along the chapters in this thesis.
In Chapter 2, we tackle the problem of the construction of a version of the SVM
that considers misclassification errors. To do this, we incorporate new performance
constraints in the SVM formulation, imposing upper bounds on the misclassification
errors. The resulting formulation is a quadratic convex problem with linear constraints.
Chapter 3 continues with the SVM as the basis, and sets out the problem of providing
not only a hard-labeling for each of the individuals belonging to the dataset, but a
class probability estimation. Furthermore, confidence intervals for both the score values
and the posterior class probabilities will be provided. In addition, as in the previous
chapter, we will carry the obtained results to the field in which misclassified errors are
considered. With such a purpose, we have to solve either a quadratic convex problem
or a quadratic convex problem with linear constraints and integer variables, and always
taking advantage of the parameter tuning of the SVM, that is usually wasted.
Based on the results in Chapter 2, in Chapter 4 we handle the problem of feature selection, taking again into account the misclassification errors. In order to build this
technique, the feature selection is embedded in the classifier model. Such a process is
divided in two different steps. In the first step, feature selection is performed while at
the same time data is separated via an hyperplane or linear classifier, considering the
performance constraints. In the second step, we build the maximum margin classifier
(SVM) using the selected features from the first step, and again taking into account
the same performance constraints.
In Chapter 5, we move to the problem of Benchmarking, where the practices of
different entities are compared through the products or services they provide. This is
done with the aim of make some changes or improvements in each of them. Concretely,
in this chapter we propose a Mixed Integer Linear Programming formulation based in
Data Envelopment Analysis (DEA), with the aim of perform feature selection, improving
the interpretability and comprehension of the obtained model and efficiencies.
Finally, in Chapter 6 we collect the conclusions of this thesis as well as future lines
of research
Essays on building and evaluating two-stage DEA models of efficiency and effectiveness
Researchers are not consistent in their choice of input and output variables when using two-stage data envelopment analysis (DEA) models to measure efficiency and effectiveness. This inconsistency has resulted in the development of many different two-stage DEA models of efficiency and effectiveness for the financial industry.
In this dissertation, I improved the statistical method from the MASc dissertation (Attarwala,
2016) by adding more features. These features are documented in Chapter 2 on page 4 and page 5.
This statistical method evaluates efficiency and effectiveness models in the banking industry. It relies on the semi-strong version of the efficient market hypothesis (EMH). The EMH is motivated by the wisdom of the crowds, discussed in Section 2.2.2.
Previously (Attarwala,
2016), I found that the two-stage DEA model of Kumar and
Gulati (2010) is not consistent with the semi-strong EMH for Indian and American banks.
In this dissertation, using my improved statistical method, I show that the two-stage DEA model of Kumar and
Gulati (2010) is not consistent with the semi-strong EMH for banks in Brazil, Canada, China, India, Japan, Mexico, South Korea and the USA from 2000-
2017.
I address the question of whether a universal two-stage DEA model of efficiency and effectiveness exists by building a variable selection framework.
This variable selection framework automatically generates two-stage DEA models of efficiency and effectiveness.
To do this, it uses the improved statistical method and a genetic search (GS) algorithm.
The variable selection framework finds the best, universal, two-stage DEA model of efficiency and effectiveness consistent with the semi-strong definition of EMH for banks in Brazil, Canada, China, India, Japan, Mexico, South Korea and the
USA and from 2000-2017.
I investigated the causal relationship between (a) the quantitative measures of efficiency and effectiveness from the best two-stage DEA model generated by the variable selection framework and (b) Tobin’s Q ratio, a financial market-based measure of bank performance.
Not only do I provide bank managers with a reasonable proxy for measuring efficiency and effectiveness, but I also address the question of whether acting on these input and output variables improves the performance of banks in the financial market.
Finally, I set up an optimization problem and find an optimal path from the two-stage DEA model of Kumar and Gulati (2010) to the best two-stage DEA model found by the variable selection framework.
This optimal path provides a set of actionable items for converting a two-stage DEA model that is not consistent with the semi-strong EMH to one that is