5 research outputs found

    Robust DEA efficiency scores: A probabilistic/combinatorial approach

    Get PDF
    In this paper we propose robust efficiency scores for the scenario in which the specification of the inputs/outputs to be included in the DEA model is modelled with a probability distribution. This proba- bilistic approach allows us to obtain three different robust efficiency scores: the Conditional Expected Score, the Unconditional Expected Score and the Expected score under the assumption of Maximum Entropy principle. The calculation of the three efficiency scores involves the resolution of an exponential number of linear problems. The algorithm presented in this paper allows to solve over 200 millions of linear problems in an affordable time when considering up 20 inputs/outputs and 200 DMUs. The approach proposed is illustrated with an application to the assessment of professional tennis players

    An Alternative Approach to Reduce Dimensionality in Data Envelopment Analysis

    Get PDF
    Principal component analysis reduces dimensionality; however, uncorrelated components imply the existence of variables with weights of opposite signs. This complicates the application in data envelopment analysis. To overcome problems due to signs, a modification to the component axes is proposed and was verified using Monte Carlo simulations

    New models and methods for classification and feature selection. a mathematical optimization perspective

    Get PDF
    The objective of this PhD dissertation is the development of new models for Supervised Classification and Benchmarking, making use of Mathematical Optimization and Statistical tools. Particularly, we address the fusion of instruments from both disciplines, with the aim of extracting knowledge from data. In such a way, we obtain innovative methodologies that overcome to those existing ones, bridging theoretical Mathematics with real-life problems. The developed works along this thesis have focused on two fundamental methodologies in Data Science: support vector machines (SVM) and Benchmarking. Regarding the first one, the SVM classifier is based on the search for the separating hyperplane of maximum margin and it is written as a quadratic convex problem. In the Benchmarking context, the goal is to calculate the different efficiencies through a non-parametric deterministic approach. In this thesis we will focus on Data Envelopment Analysis (DEA), which consists on a Linear Programming formulation. This dissertation is structured as follows. In Chapter 1 we briefly present the different challenges this thesis faces on, as well as their state-of-the-art. In the same vein, the different formulations used as base models are exposed, together with the notation used along the chapters in this thesis. In Chapter 2, we tackle the problem of the construction of a version of the SVM that considers misclassification errors. To do this, we incorporate new performance constraints in the SVM formulation, imposing upper bounds on the misclassification errors. The resulting formulation is a quadratic convex problem with linear constraints. Chapter 3 continues with the SVM as the basis, and sets out the problem of providing not only a hard-labeling for each of the individuals belonging to the dataset, but a class probability estimation. Furthermore, confidence intervals for both the score values and the posterior class probabilities will be provided. In addition, as in the previous chapter, we will carry the obtained results to the field in which misclassified errors are considered. With such a purpose, we have to solve either a quadratic convex problem or a quadratic convex problem with linear constraints and integer variables, and always taking advantage of the parameter tuning of the SVM, that is usually wasted. Based on the results in Chapter 2, in Chapter 4 we handle the problem of feature selection, taking again into account the misclassification errors. In order to build this technique, the feature selection is embedded in the classifier model. Such a process is divided in two different steps. In the first step, feature selection is performed while at the same time data is separated via an hyperplane or linear classifier, considering the performance constraints. In the second step, we build the maximum margin classifier (SVM) using the selected features from the first step, and again taking into account the same performance constraints. In Chapter 5, we move to the problem of Benchmarking, where the practices of different entities are compared through the products or services they provide. This is done with the aim of make some changes or improvements in each of them. Concretely, in this chapter we propose a Mixed Integer Linear Programming formulation based in Data Envelopment Analysis (DEA), with the aim of perform feature selection, improving the interpretability and comprehension of the obtained model and efficiencies. Finally, in Chapter 6 we collect the conclusions of this thesis as well as future lines of research

    Essays on building and evaluating two-stage DEA models of efficiency and effectiveness

    Get PDF
    Researchers are not consistent in their choice of input and output variables when using two-stage data envelopment analysis (DEA) models to measure efficiency and effectiveness. This inconsistency has resulted in the development of many different two-stage DEA models of efficiency and effectiveness for the financial industry. In this dissertation, I improved the statistical method from the MASc dissertation (Attarwala, 2016) by adding more features. These features are documented in Chapter 2 on page 4 and page 5. This statistical method evaluates efficiency and effectiveness models in the banking industry. It relies on the semi-strong version of the efficient market hypothesis (EMH). The EMH is motivated by the wisdom of the crowds, discussed in Section 2.2.2. Previously (Attarwala, 2016), I found that the two-stage DEA model of Kumar and Gulati (2010) is not consistent with the semi-strong EMH for Indian and American banks. In this dissertation, using my improved statistical method, I show that the two-stage DEA model of Kumar and Gulati (2010) is not consistent with the semi-strong EMH for banks in Brazil, Canada, China, India, Japan, Mexico, South Korea and the USA from 2000- 2017. I address the question of whether a universal two-stage DEA model of efficiency and effectiveness exists by building a variable selection framework. This variable selection framework automatically generates two-stage DEA models of efficiency and effectiveness. To do this, it uses the improved statistical method and a genetic search (GS) algorithm. The variable selection framework finds the best, universal, two-stage DEA model of efficiency and effectiveness consistent with the semi-strong definition of EMH for banks in Brazil, Canada, China, India, Japan, Mexico, South Korea and the USA and from 2000-2017. I investigated the causal relationship between (a) the quantitative measures of efficiency and effectiveness from the best two-stage DEA model generated by the variable selection framework and (b) Tobin’s Q ratio, a financial market-based measure of bank performance. Not only do I provide bank managers with a reasonable proxy for measuring efficiency and effectiveness, but I also address the question of whether acting on these input and output variables improves the performance of banks in the financial market. Finally, I set up an optimization problem and find an optimal path from the two-stage DEA model of Kumar and Gulati (2010) to the best two-stage DEA model found by the variable selection framework. This optimal path provides a set of actionable items for converting a two-stage DEA model that is not consistent with the semi-strong EMH to one that is
    corecore