16 research outputs found

    Teoría de cópulas aplicada a la predicción

    No full text
    Tesis de la Universidad Complutense de Madrid, Facultad de Ciencias Matemáticas, Departamento de Estadística e Investigación Operativa, leída el 13-02-2007En esta tesis se proponen metodologías que utilizan la Teoría de Cópulas con fines predictivos abordando, como aplicación práctica, la predicción a corto y medio plazo de la demanda de gas natural en Madrid. En ambos casos, el proceso parte de una predicción que no tiene en cuenta la influencia de la climatología sobre el consumo doméstico, y utiliza funciones cópula para estimar la desviación esperada de dicha previsión ante distintos escenarios configurados por los valores de variables de temperatura. A medio plazo, donde el objetivo es predecir el valor diario máximo que se puede esperar para el consumo durante los dos próximos años (valor pico, según la terminología energética), la predicción inicial es realizada con un modelo lineal que utiliza como regresor el comportamiento cíclico anual de la serie identificado mediante técnicas de suavizado de curvas (wavelets, splines de regresión,). Posteriormente, las funciones cópulas son empleadas para simular la distribución del incremento máximo esperado para la demanda ante una situación meteorológica especialmente adversa como es por ejemplo una ola de frío. A corto plazo (diario), se plantea un algoritmo iterativo que parte del proceso residual resultante de un ARIMA ajustado únicamente en función del histórico de demanda, y suple el empleo de modelos de función de transferencia por el de cópulas, para explicar la influencia del factor térmico. La selección de la función cópula que mejor define la relación de dependencia demanda/temperatura se establece de acuerdo a un test de bondad de ajuste de distribuciones basado en el estadístico de Pearson. Dentro de un contexto teórico, ante la posibilidad de que el contraste no permita decantarse por ninguna de las familias de cópulas candidatas, se sugiere un método de construcción de cópulas empíricas y no paramétricas que, respecto de la expresión de Pearson, presentan un valor óptimo.Depto. de Estadística e Investigación OperativaFac. de Ciencias MatemáticasTRUEpu

    A method for K-Means seeds generation applied to text mining

    No full text
    In this paper, a methodology is proposed in order to produce a set of seeds later used as a starting point to K-Means-type unsupervised classification algorithms for text mining. Our proposal involves using the eigenvectors obtained from principal component analysis to extract initial seeds, upon appropriate treatment for search of lightly overlapping clusters which are also clearly identified by keywords. This work is motivated by the interest of the authors in the problem of identification of topics and themes previously unknown in short texts. Therefore, in order to validate the goodness of this method, it was applied on a sample of labeled e-mails (NG20) representing a gold standard within the field of text mining. Specifically, some corpora referenced in the literature have been used, configured in accordance to a mix of topics contained in the sample. The proposed method improves on the results of other state-of-the-art methods to which it is compared.Ministerio de Economía y CompetitividadDepto. de Estadística e Investigación OperativaFac. de Ciencias MatemáticasTRUEpu

    The importance of each variable.

    No full text
    <p>Each bar represents the gain in the Gini index attributable to each variable used to boost the weight of the tree. Only the first 20 variables are plotted.</p

    Prediction of in-hospital mortality after pancreatic resection in pancreatic cancer patients: A boosting approach via a population-based study using health administrative data

    No full text
    Background One reason for the aggressiveness of the pancreatic cancer is that it is diagnosed late, which often limits both the therapeutic options that are available and patient survival. The long-term survival of pancreatic cancer patients is not possible if the tumor is not resected, even among patients who receive chemotherapy in the earliest stages. The main objective of this study was to create a prediction model for in-hospital mortality after a pancreatectomy in pancreatic cancer patients. Methods We performed a retrospective study of all pancreatic resections in pancreatic cancer patients in Spanish public hospitals (2013). Data were obtained from records in the Minimum Basic Data Set. To develop the prediction model, we used a boosting method. Results The in-hospital mortality of pancreatic resections in pancreatic cancer patients was 8.48% in Spain. Our model showed high predictive accuracy, with an AUC of 0.91 and a Brier score of 0.09, which indicated that the probabilities were well calibrated. In addition, a sensitivity analysis of the information available prior to the surgery revealed that our model has high predictive accuracy, with an AUC of 0.802. Conclusions In this study, we developed a nation-wide system that is capable of generating accurate and reliable predictions of in-hospital mortality after pancreatic resection in patients with pancreatic cancer. Our model could help surgeons understand the importance of the patients’ characteristics prior to surgery and the health effects that may follow resection.Depto. de Estadística e Investigación OperativaFac. de Ciencias MatemáticasTRUEpu

    The PANDEMYC Score. An Easily Applicable and Interpretable Model for Predicting Mortality Associated With COVID-19.

    Get PDF
    This study aimed to build an easily applicable prognostic model based on routine clinical, radiological, and laboratory data available at admission, to predict mortality in coronavirus 19 disease (COVID-19) hospitalized patients. We retrospectively collected clinical information from 1968 patients admitted to a hospital. We built a predictive score based on a logistic regression model in which explicative variables were discretized using classification trees that facilitated the identification of the optimal sections in order to predict inpatient mortality in patients admitted with COVID-19. These sections were translated into a score indicating the probability of a patient's death, thus making the results easy to interpret. Median age was 67 years, 1104 patients (56.4%) were male, and 325 (16.5%) died during hospitalization. Our final model identified nine key features: age, oxygen saturation, smoking, serum creatinine, lymphocytes, hemoglobin, platelets, C-reactive protein, and sodium at admission. The discrimination of the model was excellent in the training, validation, and test samples (AUC: 0.865, 0.808, and 0.883, respectively). We constructed a prognostic scale to determine the probability of death associated with each score. We designed an easily applicable predictive model for early identification of patients at high risk of death due to COVID-19 during hospitalization.S
    corecore