Search CORE

17 research outputs found

New models and methods for classification and feature selection. a mathematical optimization perspective

Author: Benítez Peña Sandra
Publication venue
Publication date: 27/07/2021
Field of study

The objective of this PhD dissertation is the development of new models for Supervised Classification and Benchmarking, making use of Mathematical Optimization and Statistical tools. Particularly, we address the fusion of instruments from both disciplines, with the aim of extracting knowledge from data. In such a way, we obtain innovative methodologies that overcome to those existing ones, bridging theoretical Mathematics with real-life problems. The developed works along this thesis have focused on two fundamental methodologies in Data Science: support vector machines (SVM) and Benchmarking. Regarding the first one, the SVM classifier is based on the search for the separating hyperplane of maximum margin and it is written as a quadratic convex problem. In the Benchmarking context, the goal is to calculate the different efficiencies through a non-parametric deterministic approach. In this thesis we will focus on Data Envelopment Analysis (DEA), which consists on a Linear Programming formulation. This dissertation is structured as follows. In Chapter 1 we briefly present the different challenges this thesis faces on, as well as their state-of-the-art. In the same vein, the different formulations used as base models are exposed, together with the notation used along the chapters in this thesis. In Chapter 2, we tackle the problem of the construction of a version of the SVM that considers misclassification errors. To do this, we incorporate new performance constraints in the SVM formulation, imposing upper bounds on the misclassification errors. The resulting formulation is a quadratic convex problem with linear constraints. Chapter 3 continues with the SVM as the basis, and sets out the problem of providing not only a hard-labeling for each of the individuals belonging to the dataset, but a class probability estimation. Furthermore, confidence intervals for both the score values and the posterior class probabilities will be provided. In addition, as in the previous chapter, we will carry the obtained results to the field in which misclassified errors are considered. With such a purpose, we have to solve either a quadratic convex problem or a quadratic convex problem with linear constraints and integer variables, and always taking advantage of the parameter tuning of the SVM, that is usually wasted. Based on the results in Chapter 2, in Chapter 4 we handle the problem of feature selection, taking again into account the misclassification errors. In order to build this technique, the feature selection is embedded in the classifier model. Such a process is divided in two different steps. In the first step, feature selection is performed while at the same time data is separated via an hyperplane or linear classifier, considering the performance constraints. In the second step, we build the maximum margin classifier (SVM) using the selected features from the first step, and again taking into account the same performance constraints. In Chapter 5, we move to the problem of Benchmarking, where the practices of different entities are compared through the products or services they provide. This is done with the aim of make some changes or improvements in each of them. Concretely, in this chapter we propose a Mixed Integer Linear Programming formulation based in Data Envelopment Analysis (DEA), with the aim of perform feature selection, improving the interpretability and comprehension of the obtained model and efficiencies. Finally, in Chapter 6 we collect the conclusions of this thesis as well as future lines of research

idUS. Depósito de Investigación Universidad de Sevilla

Statistical models in pharmacokinetics and pharmacodynamics

Author: Benítez Peña Sandra
Publication venue
Publication date: 21/06/2015
Field of study

Universidad de Sevilla. Grado en Matemática

idUS. Depósito de Investigación Universidad de Sevilla

Constrained support vector machines theory and applications to health science

Author: Benítez Peña Sandra
Publication venue
Publication date: 24/06/2016
Field of study

En los últimos años, la ciencia de los datos se ha convertido en una herramienta muy importante para tratar datos, así como para descubrir patrones y generar información útil en la toma de decisiones. Una de las tareas más importantes de la ciencia de los datos es la clasificación supervisada, la cual se ha aplicado de forma exitosa en muchas áreas, tales como la biología o la medicina. En este trabajo nos centramos en los Support Vector Machines, introducidos por Vapnik a principios de los 90 y que hoy en día son de los más usados en clasificación supervisada. En primer lugar, se hace un breve repaso de la teoría general acerca de los SVM, centrándonos en el caso binario, y dando un breve repaso al caso multiclase. Tras ello, presentamos una nueva formulación de los mismos, en las que se añaden nuevas restricciones para intentar asegurar un mínimo en los valores de ciertas medidas de rendimiento como las probabilidades de clasificación correcta. Además se realizan experimentos usando el software estadístico R, así como AMPL.Universidad de Sevilla. Máster Universitario en Matemática

idUS. Depósito de Investigación Universidad de Sevilla

Cost-sensitive probabilistic predictions for support vector machines

Author: Benítez-Peña Sandra
Blanquero Rafael
Carrizosa Emilio
Ramírez-Cobo Pepa
Publication venue
Publication date: 09/10/2023
Field of study

Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information that are not fully exploited, not being used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The new method has the following three properties. First, it is designed to be cost-sensitive, and thus the different importance of sensitivity (or true positive rate, TPR) and specificity (true negative rate, TNR) is readily accommodated in the model. As a result, the model can deal with imbalanced datasets which are common in operational business problems as churn prediction or credit scoring. Second, the SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing approaches. Numerical tests on a wide range of datasets show the advantages of our approach over benchmark procedures.Comment: European Journal of Operational Research (2023

arXiv.org e-Print Archive

Cost-sensitive feature selection for support vector machines

Author: Benítez Peña Sandra
Blanquero Bravo Rafael
Carrizosa Priego Emilio José
Ramírez Cobo Josefa
Publication venue: 'Elsevier BV'
Publication date: 01/03/2018
Field of study

Feature Selection (FS) is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequently asymmetric, since false positive and false negative cases may have very different consequences. However, off-the-shelf FS procedures seldom take into account such cost-sensitivity of errors. In this paper we propose a mathematical-optimization-based FS procedure embedded in one of the most popular classification procedures, namely, Support Vector Machines (SVM), accommodating asymmetric misclassification costs. The key idea is to replace the traditional margin maximization by minimizing the number of features selected, but imposing upper bounds on the false positive and negative rates. The problem is written as an integer linear problem plus a quadratic convex problem for SVM with both linear and radial kernels. The reported numerical experience demonstrates the usefulness of the proposed FS procedure. Indeed, our results on benchmark data sets show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved

idUS. Depósito de Investigación Universidad de Sevilla

Diseño para el consumo cultural, la innovación y la inclusión social

Author: Alejandra Marín González
Alejandra Nicolau Mora
Alejandro Sales Sánchez
Amparo Gómez Castro
Ana Aurora Maldonado Reyes /
Ana Margarita Ávila Ochoa
Ana María Torres Fragoso
Angélica Martínez de la Peña
Antonio González García
Arturo Santamaría Ortega
Carolina Serrano Barquín /
Celia Guadalupe Morales González
Diana Elisa González Calderón
Edgar Alfonso Benítez Velázquez
Frida Gómez Poblette
Gabriela Sánchez Zavala
Georgina Alicia García-Luna Villagrán /
Gerardo Hernández Néria
Héctor Serrano Barquín /
Jesús Aguiluz León /
Joaquín Trinidad Iduarte Urbieta
José de Jesús Jiménez Jiménez /
José Bernardi
Juan Noyola Carmona
Julio César Romero Becerril
Laura Gómez Vera
Luis González Cabrero
Luis Rodríguez Morales
Magali Mora Torres
Marcos Mejía-López /
Marina Porrúa
Martha Patricia Zarza Delgado /
María de las Mercedes Portilla Luja /
María del Pilar Alejandra Mora Cantellano /
María Elisa Caviedes Mondragón
MARÍA ESTHER MORALES FAJARDO /
María Gabriela Villar-García /
María Isabel Popoca Manjarrez
Miguel Ángel Rubio-Toledo /
Rosa del Carmen Castañeda Peñaloza
Sandra Utrilla Cobos
Verónica Zendejas Santín
Publication venue: 'Universidad Autonoma del Estado de Mexico'
Publication date: 01/10/2020
Field of study

Esta obra presenta diversos trabajos de investigación que tienen en común propuestas de diseño desde la cultura, la inclusión y la innovación social, desarrolladas por investigadores nacionales e internacionales adscritos a diversas universidades, así como a programas de posgrado

Repositorio Institucional de la Universidad Autónoma del Estado de México

Cost-sensitive probabilistic predictions for support vector machines

Author: Benítez Peña Sandra
Blanquero Bravo Rafael
Carrizosa Emilio
Ramírez Cobo Josefa
Publication venue: Elsevier
Publication date: 01/04/2024
Field of study

Universidad Carlos III de Madrid e-Archivo

On sparse ensemble methods: an application to short-term predictions of the evolution of COVID-19

Author: Benítez Peña Sandra
Carrizosa Priego Emilio José
Guerrero Vanesa
Jiménez Gamero María Dolores
Publication venue: 'Elsevier BV'
Publication date: 18/04/2021
Field of study

Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components, but at the same time it may overfit, it may be distorted by base regressors with low accuracy, and it may be too complex to understand and explain. This paper proposes and studies a novel Mathematical Optimization model to build a sparse ensemble, which trades offthe accuracy of the ensemble and the number of base regressors used. The latter is controlled by means of a regularization term that penalizes regressors with a poor individual performance. Our approach is flexible to incorporate desirable properties one may have on the ensemble, such as controlling the performance of the ensemble in critical groups of records, or the costs associated with the base regressors involved in the ensemble. We illustrate our approach with real data sets arising in the COVID-19 context

idUS. Depósito de Investigación Universidad de Sevilla

On support vector machines under a multiple-cost scenario

Author: Benítez Peña Sandra
Blanquero Bravo Rafael
Carrizosa Priego Emilio José
Ramírez Cobo Josefa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/07/2018
Field of study

Support vector machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for such misclassification costs, whereas it may be much easier to identify acceptable misclassification rates values. In this paper we propose a novel SVM model in which misclassification costs are considered by incorporating performance constraints in the problem formulation. Specifically, our aim is to seek the hyperplane with maximal margin yielding misclassification rates below given threshold values. Such maximal margin hyperplane is obtained by solving a quadratic convex problem with linear constraints and integer variables. The reported numerical experience shows that our model gives the user control on the misclassification rates in one class (possibly at the expense of an increase in misclassification rates for the other class) and is feasible in terms of running times

Universidad Carlos III de Madrid e-Archivo

idUS. Depósito de Investigación Universidad de Sevilla

Population pharmacokinetics of colistin: implications for clinical use for Gram-negative pathogens

Author: Benítez Peña Sandra
Blanquero Bravo Rafael
Cisneros Herreros José Miguel
Dimopoulos George
Docobo Pérez Fernando
Garnacho Montero José
Gutiérrez Pizarraya Antonio
López Cortés Luis Fernando
Pachón Ibáñez María Eugenia
Rosso Fernández Clara
Publication venue
Publication date: 01/01/2016
Field of study

The objective of this study was to characterize the pharmacokinetics of colistin methanesulphonate (CMS) and colistin in critically ill patients following the administration of a 4.5 MU CMS loading dose follow by 3MU CMS Q8. A population PK model and Monte Carlo simulation were used to calculate the probability of target attainment (PTA) against Acinetobacter baumannii and Pseudomonas aeruginosa by considering a range of MIC values seen in the clinic

idUS. Depósito de Investigación Universidad de Sevilla