13 research outputs found

    ROC curves for regression

    Full text link
    “NOTICE: this is the author’s version of a work that was accepted for publication in Pattern Recognition. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition Volume 46, Issue 12, December 2013, Pages 3395–3411 DOI: 10.1016/j.patcog.2013.06.014Receiver Operating Characteristic (ROC) analysis is one of the most popular tools for the visual assessment and understanding of classifier performance. In this paper we present a new representation of regression models in the so-called regression ROC (RROC) space. The basic idea is to represent over-estimation against under-estimation. The curves are just drawn by adjusting a shift, a constant that is added (or subtracted) to the predictions, and plays a similar role as a threshold in classification. From here, we develop the notions of optimal operating condition, convexity, dominance, and explore several evaluation metrics that can be shown graphically, such as the area over the RROC curve (AOC). In particular, we show a novel and significant result: the AOC is equivalent to the error variance. We illustrate the application of RROC curves to resource estimation, namely the estimation of software project effort.I would like to thank Peter Flach and Nicolas Lachiche for some very useful comments and corrections on earlier versions of this paper, especially the suggestion of drawing normalised curves (dividing x-axis and y-axis by n). This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project Prometeo/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the respective national research councils and ministries.Hernández-Orallo, J. (2013). ROC curves for regression. Pattern Recognition. 46(12):3395-3411. https://doi.org/10.1016/j.patcog.2013.06.014S33953411461

    Selecting cash management models from a multiobjective perspective

    Full text link
    [EN] This paper addresses the problem of selecting cash management models under different operating conditions from a multiobjective perspective considering not only cost but also risk. A number of models have been proposed to optimize corporate cash management policies. The impact on model performance of different operating conditions becomes an important issue. Here, we provide a range of visual and quantitative tools imported from Receiver Operating Characteristic (ROC) analysis. More precisely, we show the utility of ROC analysis from a triple perspective as a tool for: (1) showing model performance; (2) choosingmodels; and (3) assessing the impact of operating conditions on model performance. We illustrate the selection of cash management models by means of a numerical example.Work partially funded by projects Collectiveware TIN2015-66863-C2-1-R (MINECO/FEDER) and 2014 SGR 118.Salas-Molina, F.; Rodríguez-Aguilar, JA.; Díaz-García, P. (2018). Selecting cash management models from a multiobjective perspective. Annals of Operations Research. 261(1-2):275-288. https://doi.org/10.1007/s10479-017-2634-9S2752882611-2Ballestero, E. (2007). Compromise programming: A utility-based linear-quadratic composite metric from the trade-off between achievement and balanced (non-corner) solutions. European Journal of Operational Research, 182(3), 1369–1382.Ballestero, E., & Romero, C. (1998). Multiple criteria decision making and its applications to economic problems. Berlin: Springer.Bi, J., & Bennett, K. P. (2003). Regression error characteristic curves. In Proceedings of the 20th international conference on machine learning (ICML-03), pp. 43–50.Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.da Costa Moraes, M. B., Nagano, M. S., & Sobreiro, V. A. (2015). Stochastic cash flow management models: A literature review since the 1980s. In Decision models in engineering and management (pp. 11–28). New York: Springer.Doumpos, M., & Zopounidis, C. (2007). Model combination for credit risk assessment: A stacked generalization approach. Annals of Operations Research, 151(1), 289–306.Drummond, C., & Holte, R. C. (2000). Explicitly representing expected cost: An alternative to roc representation. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 98–207). New York: ACM.Drummond, C., & Holte, R. C. (2006). Cost curves: An improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence (Vol. 17, pp. 973–978). Lawrence Erlbaum associates Ltd.Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874.Flach, P. A. (2003). The geometry of roc space: understanding machine learning metrics through roc isometrics. In Proceedings of the 20th international conference on machine learning (ICML-03), pp. 194–201.Garcia-Bernabeu, A., Benito, A., Bravo, M., & Pla-Santamaria, D. (2016). Photovoltaic power plants: a multicriteria approach to investment decisions and a case study in western spain. Annals of Operations Research, 245(1–2), 163–175.Glasserman, P. (2003). Monte Carlo methods in financial engineering (Vol. 53). New York: Springer.Gregory, G. (1976). Cash flow models: a review. Omega, 4(6), 643–656.Hernández-Orallo, J. (2013). Roc curves for regression. Pattern Recognition, 46(12), 3395–3411.Hernández-Orallo, J., Flach, P., & Ferri, C. (2013). Roc curves in cost space. Machine Learning, 93(1), 71–91.Hernández-Orallo, J., Lachiche, N., & Martınez-Usó, A. (2014). Predictive models for multidimensional data when the resolution context changes. In Workshop on learning over multiple contexts at ECML, volume 2014.Metz, C. E. (1978). Basic principles of roc analysis. In Seminars in nuclear medicine (Vol. 8, pp. 283–298). Amsterdam: Elsevier.Miettinen, K. (2012). Nonlinear multiobjective optimization (Vol. 12). Berlin: Springer.Ringuest, J. L. (2012). Multiobjective optimization: Behavioral and computational considerations. Berlin: Springer.Ross, S. A., Westerfield, R., & Jordan, B. D. (2002). Fundamentals of corporate finance (sixth ed.). New York: McGraw-Hill.Salas-Molina, F., Pla-Santamaria, D., & Rodriguez-Aguilar, J. A. (2016). A multi-objective approach to the cash management problem. Annals of Operations Research, pp. 1–15.Srinivasan, V., & Kim, Y. H. (1986). Deterministic cash flow management: State of the art and research directions. Omega, 14(2), 145–166.Steuer, R. E., Qi, Y., & Hirschberger, M. (2007). Suitable-portfolio investors, nondominated frontier sensitivity, and the effect of multiple objectives on standard portfolio selection. Annals of Operations Research, 152(1), 297–317.Stone, B. K. (1972). The use of forecasts and smoothing in control limit models for cash management. Financial Management, 1(1), 72.Torgo, L. (2005). Regression error characteristic surfaces. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 697–702). ACM.Yu, P.-L. (1985). Multiple criteria decision making: concepts, techniques and extensions. New York: Plenum Press.Zeleny, M. (1982). Multiple criteria decision making. New York: McGraw-Hill

    Probabilistic reframing for cost-sensitive regression

    Full text link
    © ACM, 2014. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Knowledge Discovery from Data (TKDD), VOL. 8, ISS. 4, (October 2014) http://doi.acm.org/10.1145/2641758Common-day applications of predictive models usually involve the full use of the available contextual information. When the operating context changes, one may fine-tune the by-default (incontextual) prediction or may even abstain from predicting a value (a reject). Global reframing solutions, where the same function is applied to adapt the estimated outputs to a new cost context, are possible solutions here. An alternative approach, which has not been studied in a comprehensive way for regression in the knowledge discovery and data mining literature, is the use of a local (e.g., probabilistic) reframing approach, where decisions are made according to the estimated output and a reliability, confidence, or probability estimation. In this article, we advocate for a simple two-parameter (mean and variance) approach, working with a normal conditional probability density. Given the conditional mean produced by any regression technique, we develop lightweight “enrichment” methods that produce good estimates of the conditional variance, which are used by the probabilistic (local) reframing methods. We apply these methods to some very common families of costsensitive problems, such as optimal predictions in (auction) bids, asymmetric loss scenarios, and rejection rules.This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, and TIN 2013-45732-C4-1-P and GVA projects PROMETEO/2008/051 and PROMETEO2011/052. Finally, part of this work was motivated by the REFRAME project (http://www.reframe-d2k.org) granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA) and funded by Ministerio de Economia y Competitividad in Spain (PCIN-2013-037).Hernández Orallo, J. (2014). Probabilistic reframing for cost-sensitive regression. ACM Transactions on Knowledge Discovery from Data. 8(4):1-55. https://doi.org/10.1145/2641758S15584G. Bansal, A. Sinha, and H. Zhao. 2008. Tuning data mining methods for cost-sensitive regression: A study in loan charge-off forecasting. Journal of Management Information System 25, 3 (Dec. 2008), 315--336.A. P. Basu and N. Ebrahimi. 1992. Bayesian approach to life testing and reliability estimation using asymmetric loss function. Journal of Statistical Planning and Inference 29, 1--2 (1992), 21--31.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2010. Quantification via probability estimators. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 737--742.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2013. Aggregative quantification for regression. Data Mining and Knowledge Discovery (2013), 1--44.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2009. Calibration of machine learning models. In Handbook of Research on Machine Learning Applications. IGI Global, 128--146.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2011. Using negotiable features for prescription problems. Computing 91, 2 (2011), 135--168.J. Bi and K. P. Bennett. 2003. Regression error characteristic curves. In Proceedings of the 20th International Conference on Machine Learning (ICML’03).Z. Bosnić and I. Kononenko. 2008. Comparison of approaches for estimating reliability of individual regression predictions. Data & Knowledge Engineering 67, 3 (2008), 504--516.Z. Bosnić and I. Kononenko. 2009. An overview of advances in reliability estimation of individual predictions in machine learning. Intelligent Data Analysis 13, 2 (2009), 385--401.L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth.P. F. Christoffersen and F. X. Diebold. 1996. Further results on forecasting and model selection under asymmetric loss. Journal of Applied Econometrics 11, 5 (1996), 561--571.P. F. Christoffersen and F. X. Diebold. 1997. Optimal prediction under asymmetric loss. Econometric Theory 13 (1997), 808--817.I. Cohen and M. Goldszmidt. 2004. Properties and benefits of calibrated classifiers. Knowledge Discovery in Databases: PKDD 2004 (2004), 125--136.S. Crone. 2002. Training artificial neural networks for time series prediction using asymmetric cost functions. In Proceedings of the 9th International Conference on Neural Information Processing.J. Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7 (2006), 1--30.M. Dumas, L. Aldred, G. Governatori, and A. H. M. Ter Hofstede. 2005. Probabilistic automated bidding in multiple auctions. Electronic Commerce Research 5, 1 (2005), 25--49.C. Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the 17th International Conference on Artificial Intelligence (’01), Bernhard Nebel (Ed.). San Francisco, CA, 973--978.G. Elliott and A. Timmermann. 2004. Optimal forecast combinations under general loss functions and forecast error distributions. Journal of Econometrics 122, 1 (2004), 47--79.T. Fawcett. 2006a. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874.T. Fawcett. 2006b. ROC graphs with instance-varying costs. Pattern Recognition Letters 27, 8 (2006), 882--891.C. Ferri, P. Flach, and J. Hernández-Orallo. 2002. Learning decision trees using the area under the ROC curve. In Proceedings of the International Conference on Machine Learning. 139--146.C. Ferri, P. Flach, and J. Hernández-Orallo. 2003. Improving the AUC of probabilistic estimation trees. In Proceedings of the 14th European Conference on Machine Learning (ECML’03). Springer, 121--132.C. Ferri and J. Hernández-Orallo. 2004. Cautious classifiers. In ROC Analysis in Artificial Intelligence, 1st International Workshop, ROCAI-2004, Valencia, Spain, August 22, 2004, J. Hernández-Orallo, C. Ferri, N. Lachiche, and P. A. Flach (Eds.). 27--36.P. Flach. 2012. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press.G. Forman. 2008. Quantifying counts and costs via classification. Data Mining and Knowledge Discovery 17, 2 (2008), 164--206.S. García and F. Herrera. 2008. An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. The Journal of Machine Learning Research 9, 2677--2694 (2008), 66.R. Ghani. 2005. Price prediction and insurance for online auctions. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, 411--418.C. W. J. Granger. 1969. Prediction with a generalized cost of error function. Operational Research (1969), 199--207.C. W. J. Granger. 1999. Outline of forecast theory using generalized cost functions. Spanish Economic Review 1, 2 (1999), 161--173.P. Hall, J. Racine, and Q. Li. 2004. Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association 99, 468 (2004), 1015--1026.P. Hall, R. C. L. Wolff, and Q. Yao. 1999. Methods for estimating a conditional distribution function. Journal of the American Statistical Association (1999), 154--163.T. J. Hastie, R. J. Tibshirani, and J. H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.J. Hernández-Orallo. 2013. ROC curves for regression. Pattern Recognition 46, 12 (2013), 3395--3411.J. Hernández-Orallo, P. Flach, and C. Ferri. 2012. A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research 13 (2012), 2813--2869.J. Hernández-Orallo, P. Flach, and C. Ferri. 2013. ROC curves in cost space. Machine Learning 93, 1 (2013), 71--91.J. N. Hwang, S. R. Lay, and A. Lippman. 1994. Nonparametric multivariate density estimation: A comparative study. IEEE Transactions on Signal Processing 42, 10 (1994), 2795--2810.R. J. Hyndman, D. M. Bashtannyk, and G. K. Grunwald. 1996. Estimating and visualizing conditional densities. Journal of Computational and Graphical Statistics (1996), 315--336.N. Japkowicz and M. Shah. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press.M. Jino, B. T. de Abreu, and others. 2010. Machine learning methods and asymmetric cost function to estimate execution effort of software testing. In Proceedings of the 2010 3rd International Conference on Software Testing, Verification and Validation (ICST’10). IEEE, 275--284.B. Kitts and B. Leblanc. 2004. Optimal bidding on keyword auctions. Electronic Markets 14, 3 (2004), 186--201.N. Lachiche and P. Flach. 2003. Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In Proceedings of the International Conference on Machine Learning, Vol. 20-1. 416.H. Papadopoulos. 2008. Inductive conformal prediction: Theory and application to neural networks. Tools in Artificial Intelligence 18 (2008), 315--330.H. Papadopoulos, K. Proedrou, V. Vovk, and A. Gammerman. 2002. Inductive confidence machines for regression. In Machine Learning: ECML 2002, Tapio Elomaa, Heikki Mannila, and Hannu Toivonen (Eds.). Lecture Notes in Computer Science, Vol. 2430. Springer, Berlin, 185--194.H. Papadopoulos, V. Vovk, and A. Gammerman. 2011. Regression conformal prediction with nearest neighbours. Journal of Artificial Intelligence Research 40, 1 (2011), 815--840.T. Pietraszek. 2007. On the use of ROC analysis for the optimization of abstaining classifiers. Machine Learning 68, 2 (2007), 137--169.J. C. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers. MIT Press, Boston, 61--74.F. Provost and P. Domingos. 2003. Tree induction for probability-based ranking. Machine Learning 52, 3 (2003), 199--215.R Team and others. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.R. Ribeiro. 2011. Utility-based Regression. PhD thesis, Department of Computer Science, Faculty of Sciences, University of Porto.M. Rosenblatt. 1969. Conditional probability density and regression estimators. Multivariate Analysis II 25 (1969), 31.S. Rosset, C. Perlich, and B. Zadrozny. 2007. Ranking-based evaluation of regression models. Knowledge and Information Systems 12, 3 (2007), 331--353.R. E. Schapire, P. Stone, D. McAllester, M. L. Littman, and J. A. Csirik. 2002. Modeling auction price uncertainty using boosting-based conditional density estimation. In Proceedings of the International Conference on Machine Learning. 546--553.G. Shafer and V. Vovk. 2008. A tutorial on conformal prediction. Journal of Machine Learning Research 9 (2008), 371--421.J. A. Swets, R. M. Dawes, and J. Monahan. 2000. Better decisions through science. Scientific American 283, 4 (Oct. 2000), 82--87.R. D. Thompson and A. P. Basu. 1996. Asymmetric loss functions for estimating system reliability. In Bayesian Analysis in Statistics and Econometrics. John Wiley & Sons, 471--482.L. Torgo. 2005. Regression error characteristic surfaces. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 697--702.L. Torgo. 2010. Data Mining with R. Chapman and Hall/CRC Press.L. Torgo and R. Ribeiro. 2007. Utility-based regression. Knowledge Discovery in Databases: PKDD 2007. 597--604.L. Torgo and R. Ribeiro. 2009. Precision and recall for regression. In Discovery Science. Springer, 332--346.P. Turney. 2000. Types of cost in inductive concept learning. Canada National Research Council Publications Archive.L. Wasserman. 2006. All of Nonparametric Statistics. Springer-Verlag, New York.M. P. Wellman, D. M. Reeves, K. M. Lochner, and Y. Vorobeychik. 2004. Price prediction in a trading agent competition. Journal of Artificial Intelligence Research 21 (2004), 19--36.K. Yu and M. C. Jones. 2004. Likelihood-based local linear estimation of the conditional variance function. Journal of the American Statistical Association 99, 465 (2004), 139--144.B. Zadrozny and C. Elkan. 2002. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 694--699.A. Zellner. 1986. Bayesian estimation and prediction using asymmetric loss functions. Journal of the American Statistical Association (1986), 446--451.H. Zhao, A. P. Sinha, and G. Bansal. 2011. An extended tuning method for cost-sensitive regression and forecasting. Decision Support Systems

    DATA-DRIVEN DECISION-MAKING AND ITS APPLICATION TO THE CORPORATE CASH MANAGEMENT PROBLEM

    Full text link
    Esta tesis investiga el problema de gestión de tesorería desde un punto de vista multidimensional. La gestión de tesorería trata de equilibrar la cantidad que se mantiene en efectivo y la que se dedica a inversiones a corto plazo. Normalmente, los tesoreros toman decisiones basándose en el nivel óptimo de tesorería por motivos operativos y de precaución. En esta tesis exploramos las oportunidades para mejorar la toma decisiones derivadas de modelar la incertidumbre presente en los flujos de caja con la ayuda de procedimientos basados en datos en un entorno multiobjetivo. Por un lado, los tesoreros pueden conseguir ahorros a través de la previsión de tesorería. Para ello, realizamos un estudio empírico con el objetivo de aprovechar las más recientes técnicas de aprendizaje automático como paso clave para conectar el análisis de los datos disponibles con los procesos de optimización en la gestión de tesorería. Por otro lado, los tesoreros pueden estar interesados no solo en el coste sino también en al riesgo asociado a sus decisiones. Por esta razón, tratamos el problema de gestión de tesorería desde una perspectiva multiobjetivo, considerando tanto el coste como el riesgo. Además, debido a la cambiante situación financiera actual, exploramos la selección de modelos de gestión de tesorería en función de diferentes condiciones operativas y de su robustez. También demostramos la utilidad de las previsiones a través de un nuevo modelo de gestión de tesorería que mejora el estado del arte al garantizar soluciones óptimas. Como la mayoría de las empresas trabaja con sistemas de tesorería con múltiples cuentas bancarias, desarrollamos un marco para la formulación y solución del problema de gestión de tesorería con múltiples cuentas bancarias. Finalmente, en un intento de acercar teoría y práctica, también ofrecemos una librería de software en Python para usuarios interesados en la construcción de sistemas de ayuda a la toma de decisiones en gestión de tesorería.This thesis investigates the cash management problem from a multidimensional perspective. Cash management focuses on finding the balance between cash holdings and short-term investments. Typically, cash managers make decisions based usually on a firm's optimal cash balance for operational and precautionary purposes. We here explore the opportunities for improved decision-making derived from modeling cash flow uncertainty with the help of data-driven procedures within a multiobjective context. On the one hand, cash managers may achieve cost savings by forecasting future cash flows. To this end, we perform an empirical analysis of daily cash flow time-series to take advantage of modern machine learning techniques as a key step to connect data analysis and optimization methods in cash management. On the other hand, cash managers may be interested not only in the cost but also in the risk associated to decision-making. Thus, we address the cash management problem from a multiobjective perspective focusing on both cost and risk. In addition, under the current situation of time-varying financial circumstances, the selection of cash management models according to operating conditions and its robustness are worth considering questions. We also show the utility of forecasts through a new cash management model which outperforms the state-of-the-art by guaranteeing optimal solutions. Since most firms usually deal with cash management systems with multiple accounts, we develop a framework to formulate and solve the multiple bank accounts cash management problem. Finally, in an attempt to fill the gap between theory and practice, we also provide a software library in Python for practitioners interested in building decision support systems for cash management.Esta tesi investiga el problema de gestió de tresoreria des d'un punt de vista multidimensional. La gestió de tresoreria tracta d'equilibrar la quantitat que es manté en efectiu i la que es dedica a inversions a curt termini. Normalment, el tresorers prenen decisions basant-se en el nivell òptim de tresoreria per motius operatius i de precaució. En aquesta tesi explorem les oportunitats per millorar la presa de decisions derivades de modelitzar la incertesa present en els fluxos de caixa amb l'ajuda de procediments basats en dades. Per un costat, els tresorers poden aconseguir estalvis de costos mitjançant la previsió de tresoreria. Per tal d'aconseguir-ho, realitzem d'un estudi empíric amb l'objectiu d'aprofitar les més recents tècniques d'aprenentatge automàtic per connectar l'anàlisi de les dades disponbiles amb els procesos d'optimització en la gestió de tresoreria. Per altra banda, els tresorers poden estar interessats no sols en el cost sinó també en el risc associat a les seues decisions. Per tant, tractem el problema de gestió de tresoreria des d'un punt de vista multiobjectiu, fixant-se tant en el cost com en el risc. A més a més, degut a la canviant situació financera actual, explorem la selecció de models de gestió de tresoreria en funció de diferents condicions operatives i de la seua robustesa. També demostrem la utilitat de les previsions mitjançant un nou model de tresoreria que millora l'estat de l'art al garantir solucions òptimes. Com que la majoria d'empreses treballa amb sistemes de tresoreria amb múltiples comptes bancaris, desenvolupem un marc per a la formulació i solució del problema de gestió de tresoreria amb múltiples comptes bancaris. Finalment, en un intent d'apropar teoria i pràctica, també oferim un llibreria en Python per a usuaris interessats en la construcció de sistemes d'ajuda a la presa de decisions en la gestió de tresoreria.Salas Molina, F. (2017). DATA-DRIVEN DECISION-MAKING AND ITS APPLICATION TO THE CORPORATE CASH MANAGEMENT PROBLEM [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/95408TESI

    Modelos de previsão em gestão hospitalar recorrendo a técnicas de data mining

    Get PDF
    Dissertação de mestrado em Engenharia e Gestão de Sistemas de InformaçãoÉ notório que as falhas verificadas na gestão hospitalar estão normalmente relacionadas com a falta de informação e a insuficiente gestão de recursos. Estes aspetos são determinantes para a gestão de qualquer entidade organizacional. Foi a partir deste princípio que se abordou o processo de Data Mining (DM) neste projeto, com o intuito de identificar dados pertinentes sobre a gestão de doentes e assim proporcionar aos gestores do Centro Hospitalar do Porto (CHP) informações importantes para fundamentar as suas decisões. Durante a realização desta Dissertação, foram desenvolvidos modelos de DM capazes de realizar previsões em âmbito hospitalar (gestão de altas). O desenvolvimento dos modelos de previsão foi realizado em ambiente real, com dados reais oriundos do CHP. Para isso foi adotada metodologia de investigação Action Research, o mesmo foi orientado segundo a metodologia Cross-Industry Standard Process for Data Mining (CRISP-DM). Ao nível do DM foram usadas as técnicas baseadas em Árvores de Decisão, Árvores de Regressão (AR), Naïve Bayes e Support Vector Machine (SVM) para realizar as tarefas de Classificação e Regressão. A avaliação e validação dos modelos de Classificação foi efectuada através da utilização da métrica baseada na acuidade. Para os modelos de Regressão foram usadas várias métricas, Mean Squared Error, Mean Absolute Error, Relative Absolute Error e Regression Error Characteristic. Para além destas métricas foram ainda usadas as técnicas Cross Validation e Leave-One-Out Cross Validation para avaliar a capacidade de generalização dos modelos de previsão. Os modelos de Classificação foram capazes de prever altas de doentes com valores de acuidade compreendidos entre ≈82.69% e ≈94.23%. Alguns dos modelos de Regressão obtiveram um desempenho similar ou inferior ao previsor médio naïve, resultados no geral compreendidos entre ≈38.26% e ≈94,89%. Os resultados obtidos permitem suportar decisões ao nível da gestão de altas. Com este trabalho foi também possível concluir que os modelos de Classificação apresentam resultados menos satisfatórios para os serviços de Ortopedia e Obstetrícia e os modelos de Regressão para o serviço de Parto. Porém a Classificação proporcionou bons modelos de previsão para o serviço de Parto e Berçário, e a Regressão para os serviços de Ortopedia, Obstetrícia e Berçário.The hospitals mismanagement is associated with the lack of information and poor management of resources. These aspects are crucial for the management of any organizational entity. It is on this principle that the Data Mining (DM) process was addressed in this project, to identify relevant information about the management of patients and thus provide to the managers of Centro Hospitalar of Porto (CHP) important information to help in their decisions. While performing this dissertation, several DM models were developed to predict hospital discharge. The development of the predictive models was conducted in a real environment with real data. This project was conducted using the Action Research research methodology and the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. From the DM techniques, Decision Trees, Naïve Bayes and Support Vector Machine were used to induce Classification and Regression models. The evaluation and validation of the Classification models was done through the acuity obtained in the results. For Regression models several metrics were used, namely: Mean Squared Error, Mean Absolute Error, Relative Absolute Error and Regression Error Characteristic. In addition to these metrics it was used the Cross Validation and Leave-One-Out Cross Validation techniques to evaluate generalization capacity of the models. The classification models were able to predict the patient discharges with acuity values ranging from ≈82.69% to ≈94.23%. The regression models achieved a performance similar to or lower than the average naïve prediction, being comprehended between ≈38.26% and ≈94.89%. The results are able to support management decisions, when it comes to patients discharge management, however Classification models for the Orthopedics and Obstetrics services and regression models for Childbirth service presented less satisfactory results. However, Classification provided good predictive models for service Childbirth and Nursery, and Regression to the services of Orthopedics, Obstetrics and Nursery

    Desenvolvimento de um índice de conforto para os ocupantes de edifícios via técnicas de data mining

    Get PDF
    Dissertação de mestrado integrado em Engenharia CivilAtualmente, as condições ambientais no interior dos edifícios revelam-se, cada vez mais, fatores de decisão por parte dos clientes, de tal maneira, que se tornou uma área de investigação no setor da construção. Há, então, uma crescente necessidade de desenvolver metodologias que permitam modelar a resposta humana aos diversos estímulos ambientais, quer de um modo individual quer de um modo geral. Tendo em conta a ausência de estudos nesta área a nível nacional, o principal objetivo deste trabalho passou pelo desenvolvimento de um índice que permita concluir o peso de cada um dos indicadores da categoria “Conforto e Saúde” do sistema de avaliação da sustentabilidade SBToolPT. Para tal, foi desenvolvida uma metodologia onde se pretende avaliar o nível de desconforto global no interior de um edifício a partir de avaliações objetivas (parâmetros físicos) e avaliações subjetivas (insatisfação dos ocupantes) dos indicadores considerados: nível de desconforto térmico, nível de desconforto lumínico, nível de desconforto acústico e nível de desconforto relativo à qualidade do ar. Os parâmetros métricos considerados nas avaliações físicas foram: temperatura operativa para o ambiente térmico, a iluminância para a luminosidade, o nível sonoro contínuo equivalente ponderado A para a acústica e a quantidade de dióxido de carbono para a qualidade do ar interior. Este estudo teve como base a análise de 344 avaliações subjetivas recolhidas por Mateus (2009). O objetivo foi obter a sensação de desconforto dos ocupantes em relação aos diferentes níveis de cada indicador. Estas avaliações foram complementadas com as avaliações objetivas dos 64 compartimentos onde foram recolhidas as avaliações subjetivas. Para analisar esses dados, vários algoritmos de data mining foram explorados, incluindo a Regressão Múltipla, para a qual foram obtidos os melhores resultados. Este modelo pode ser usado para simular o nível de conforto dos ocupantes, de acordo com os limites esperados dos parâmetros físicos de conforto. Este estudo é muito útil para apoiar a tomada de decisão desde as fases iniciais do projeto.Currently, environmental conditions are increasingly becoming a key attribute on the decision of buying or renting a building. This fact led to the development of new research line in the construction industry. There is an increasing need to develop methodologies that allow building designers to shape the human response to the different environmental parameters, whether on a single person or on a group of individuals. Given the lack of studies in this area within Portugal, the main goal of this thesis is to develop an index that allow to understand the importance of each indicator of the sustainability category “Health and Comfort" of the SBToolPT sustainability evaluation system. In order to achieve this goal, a methodology has been developed intending to evaluate the global indoor environment discomfort level of a building, through objective evaluations (physical parameters) and subjective evaluations (occupant’s dissatisfaction responses) of the considered indicators: level of thermal discomfort, level of visual discomfort, level of acoustic discomfort and level of discomfort with the air quality. The metric parameters considered on the physical evaluations were: the operative temperature to evaluate the thermal environment, the illuminance for lighting, A weighted equivalent continuous sound level for acoustic and the concentration of carbon dioxide for the air quality. This study is based on the analysis of 344 subjective responses to a survey conducted by Mateus (2009). The purpose was to get the occupant’s perceived level of discomfort according to different levels of each physical parameter. These evaluations have also been complemented with objective evaluations taken on the same 64 compartments where the subjective evaluations have been collected. To analyze this data, several data mining algorithms were explored, including Multiple Regression, for which the best modeling results were obtained. Such model can be used to simulate the occupant's level of comfort according to the expected boundaries of the comfort physical parameters. This study is very useful to support decision making since early design phases
    corecore