57 research outputs found

    Development and application of new machine learning models for the study of colorectal cancer

    Get PDF
    En la actualidad, en el ámbito sanitario, hay un interés creciente en la consideración de técnicas de Inteligencia Artificial, en concreto técnicas de Aprendizaje Automático o Machine Learning, que tan buenos resultados están proporcionando desde hace tiempo en diferentes ámbitos, como la industria, el comercio electrónico, la educación, etc. Sin embargo, en el ámbito de la sanidad hay un reto aún mayor ya que, además de necesitar sistemas muy probados, puesto que sus resultados van a repercutir directamente en la salud de las personas, también es necesario alcanzar un buen equilibrio en cuanto a interpretabilidad. Esto es de gran importancia ya que, actualmente, con métodos de caja negra, que pueden llegar a ser muy precisos, es difícil saber qué motivó que el sistema automático tomara una decisión y no otra. Esto puede generar rechazo entre los profesionales sanitarios debido a la inseguridad que pueden llegar a sentir por no poder explicar una decisión clínica tomada en base a un sistema de apoyo a la toma de decisiones. En este contexto, desde el primer momento establecimos que la interpretabilidad de los resultados debía ser una de las premisas que gobernara transversalmente todo el trabajo que se desarrollara en esta tesis doctoral. En este sentido, todos los desarrollos realizados generan bien árboles de clasificación (los cuales dan lugar a reglas interpretables) o bien reglas de asociación que describen relaciones entre los datos existentes. Por otro lado, el cáncer colorrectal es una neoplasia maligna con una alta morbimortalidad tanto en hombres como en mujeres. Esta requiere, indiscutiblemente, de una atención multidisciplinar en la que diferentes profesionales sanitarios (médicos de familia, gastroenterólogos, radiólogos, cirujanos, oncólogos, farmacéuticos, personal de enfermería, etc.) realicen un abordaje conjunto de la patología para ofrecer la mejor atención posible al paciente. Pero además, en adelante, sería muy interesante incorporar a científicos de datos en ese equipo multidisciplinar, ya que se puede sacar un gran partido a toda la información que se genera diariamente sobre esta patología. En esta tesis doctoral se ha planteado, precisamente, el estudio de un conjunto de datos de pacientes con cáncer colorrectal con un un conjunto de técnicas de inteligencia artificial y el desarrollo de nuevos modelos de aprendizaje automático para el mismo. Los resultados han sido los que se exponen a continuación: Una revisión bibliográfica sobre el uso de Machine Learning aplicado a cáncer colorrectal, a partir de la cual se ha realizado una taxonomía de los trabajos existentes a fecha de realización del estudio del estado del arte. Esta taxonomía clasifica los diferentes trabajos estudiados atendiendo a diferentes criterios como son el tipo de dataset utilizado, el tipo de algoritmo implementado, el tamaño del dataset y su disponibilidad pública, el uso o no de algoritmos de selección de características y el uso o no de técnicas de extracción de características. Un modelo de extracción de reglas de asociación de clases con la intención de entender mejor por qué algunos pacientes podrían sufrir complicaciones tras una intervención quirúrgica o recidivas de su cáncer. Este trabajo ha dado lugar a una metodología para la obtención de descripciones interpretables y manejables (es importante que las reglas generadas tengan un tamaño reducido de manera que así sea útil para los sanitarios). Un modelo de selección de características y de instancias para poder inducir mejores árboles de clasificación. Un algoritmo de Evolución Gramatical para inducir una gran variedad de árboles de clasificación tan precisos como los obtenidos por los conocidos métodos C4.5 y CART. En este caso, se ha utilizado la librería PonyGE2 de Python y, debido a su escasa especificidad para aplicación a nuestro problema, se han desarrollado una serie de operadores que permiten inducir árboles más interpretables en comparación con los que produce PonyGE2 de forma estándar. Los resultados obtenidos en cada uno de los desarrollos realizados se han comparado con los resultados proporcionados por métodos existentes en la literatura y de reconocido prestigio, tanto del campo de la clasificación como del campo de la minería de reglas de asociación, demostrándose una mejor adaptación de nuestros modelos a las características que presentaba el conjunto de datos de estudio, y que pueden ser de aplicación a otros casos.Today, in healthcare, there is a growing interest in considering Artificial Intelligence techniques, specifically Machine Learning techniques, which have been providing good results in different fields such as industry, e‑commerce, education, etc., since a long time ago. However, in the field of healthcare there is an even greater challenge because it is needed both highly tested systems, since their results will have a direct impact on people's health, and a good level in terms of interpretability. This is very important since with black box methods, which can be very precise, it will be dificult to know what motivated the automatic system to take one decision or any other. This fact can generate rejection among healthcare professionals due to the insecurity they may feel because they cannot explain a clinical decision taken on the basis of a decision support system. In this context, from the very begining we established that the interpretability of the results should be one of the premises leading all the work carried out in this doctoral thesis. In this sense, all the developments carried out generate either classification trees (which produce interpretable rules) or association rules that describe relationships between existing data. On the other hand, colorectal cancer is a malignant neoplasia with a high morbidity and mortality rate in both men and women, which unquestionably requires multidisciplinary care in which different healthcare professionals (family doctors, gastroenterologists, radiologists, surgeons, oncologists, pharmacists, nursing staff, etc.) take a joint approach to the pathology in order to offer the best possible care to the patient. But it would also be very interesting to incorporate data scientists into this multidisciplinary team in the future, as they can make the most of all the information that is generated on this pathology daily. In this doctoral thesis, it has been proposed the study of a dataset of patients with colorectal cancer with a set of artificial intelligence techniques and the development of new machine learning models for it. The results are shown below: A literature review on the use of Machine Learning applied to colorectal cancer, from which a taxonomy of the existing works has been produced. This taxonomy classifies the different works of the state‑of‑the‑arte according to different criterio such as the type of dataset that has been used, the type of algorithm that has been implemented, the size of the dataset and its public availability, the use or not of feature selection algorithms and the use or not of feature extraction techniques. A class association rule extraction model with the intention of better understanding why some patients might experience complications after surgery or recurrence of their cancer. This work has given rise to a methodology for obtaining interpretable and manageable descriptions (it is important that the generated rules have a reduced size so that they are useful for practitioners). A feature and instance selection model to induce better classification trees. A Grammatical Evolution algorithm to induce a wide variety of classification trees as accurate as those obtained by the well‑known C4.5 and CART methods. In this case, the PonyGE2 Python library has been used and, due to its low specificity for application to our problem, a series of operators have been developed, which allow inducing more interpretable trees compared to those produced by PonyGE2 in a standard way. The results obtained in each of the developments carried out have been compared with the results provided by well known methods existing in the literature, both in the field of classification and in the field of association rule mining, demonstrating a better fit of our models to the features of the dataset, which can be applied to other cases. great efficiency in our models. This demonstrates that it is possible to reach a good balance between precision and interpretability

    Bayesian probability encoding in medical decision analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Computational Tools for the Untargeted Assignment of FT-MS Metabolomics Datasets

    Get PDF
    Metabolomics is the study of metabolomes, the sets of metabolites observed in living systems. Metabolism interconverts these metabolites to provide the molecules and energy necessary for life processes. Many disease processes, including cancer, have a significant metabolic component that manifests as differences in what metabolites are present and in what quantities they are produced and utilized. Thus, using metabolomics, differences between metabolomes in disease and non-disease states can be detected and these differences improve our understanding of disease processes at the molecular level. Despite the potential benefits of metabolomics, the comprehensive investigation of metabolomes remains difficult. A popular analytical technique for metabolomics is mass spectrometry. Advances in Fourier transform mass spectrometry (FT-MS) instrumentation have yielded simultaneous improvements in mass resolution, mass accuracy, and detection sensitivity. In the metabolomics field, these advantages permit more complicated, but more informative experimental designs such as the use of multiple isotope-labeled precursors in stable isotope-resolved metabolomics (SIRM) experiments. However, despite these potential applications, several outstanding problems hamper the use of FT-MS for metabolomics studies. First, artifacts and data quality problems in FT-MS spectra can confound downstream data analyses, confuse machine learning models, and complicate the robust detection and assignment of metabolite features. Second, the assignment of observed spectral features to metabolites remains difficult. Existing targeted approaches for assignment often employ databases of known metabolites; however, metabolite databases are incomplete, thus limiting or biasing assignment results. Additionally, FT-MS provides limited structural information for observed metabolites, which complicates the determination of metabolite class (e.g. lipid, sugar, etc. ) for observed metabolite spectral features, a necessary step for many metabolomics experiments. To address these problems, a set of tools were developed. The first tool identifies artifacts with high peak density observed in many FT-MS spectra and removes them safely. Using this tool, two previously unreported types of high peak density artifact were identified in FT-MS spectra: fuzzy sites and partial ringing. Fuzzy sites were particularly problematic as they confused and reduced the accuracy of machine learning models trained on datasets containing these artifacts. Second, a tool called SMIRFE was developed to assign isotope-resolved molecular formulas to observed spectral features in an untargeted manner without a database of expected metabolites. This new untargeted method was validated on a gold-standard dataset containing both unlabeled and 15N-labeled compounds and was able to identify 18 of 18 expected spectral features. Third, a collection of machine learning models was constructed to predict if a molecular formula corresponds to one or more lipid categories. These models accurately predict the correct one of eight lipid categories on our training dataset of known lipid and non-lipid molecular formulas with precisions and accuracies over 90% for most categories. These models were used to predict lipid categories for untargeted SMIRFE-derived assignments in a non-small cell lung cancer dataset. Subsequent differential abundance analysis revealed a sub-population of non-small cell lung cancer samples with a significantly increased abundance in sterol lipids. This finding implies a possible therapeutic role of statins in the treatment and/or prevention of non-small cell lung cancer. Collectively these tools represent a pipeline for FT-MS metabolomics datasets that is compatible with isotope labeling experiments. With these tools, more robust and untargeted metabolic analyses of disease will be possible

    Preface

    Get PDF

    Measuring the Efficiency of the Living Kidney Donor Candidate Evaluation Process

    Get PDF
    Background: Living kidney donation is the ideal treatment for many patients with kidney failure. However, the living donor evaluation process has been criticized by patients and healthcare providers as inefficient. In the present research, we evaluated the inefficiency of the living donor evaluation process. Methods: We conducted a scoping review of the literature and obtained data from large administrative datasets (1256 living donors) and medical chart review (849 prospectively recruited living donors across 12 transplant centres plus retrospective analysis of 1065 living donor candidates from a single centre). Results: The median time to complete the entire evaluation was 9-11 months for donors and 4.3 months for candidates who were declined or withdrew from the evaluation. Up to 35% of recipients who could potentially have received a pre-emptive transplant (avoided dialysis entirely) started dialysis before transplantation, costing the healthcare system 8.1Mfordialysisalone.Shorteningtheevaluationtimebyonly108.1M for dialysis alone. Shortening the evaluation time by only 10% translated to an annual cost savings of at least 1.3M in Ontario due to averted dialysis costs and up to 38 intended recipients each year could have received a transplant they otherwise did not receive (17% increase in living donor transplantation). The cost to the healthcare system was 3,641forthedonorevaluation,3,641 for the donor evaluation, 11,695 for the donor surgery (including perioperative costs), and $933 for the first year post-donation. There are many reasons that may contribute to a longer living donor evaluation. Donation through kidney paired donation prolonged the time until donation by 6 months. The evaluation time was doubled if the intended recipient started dialysis part-way through the donors’ evaluation. Finally, every month delay in the recipient referral extended the time until donation by 0.4-0.9 months and increased the likelihood that the recipient would start dialysis before transplant. Between-centre differences were observed for evaluation times and donation costs. Conclusions: The living donor evaluation is time-consuming, resulting in potentially avoidable unintended adverse consequences to donor candidates, their intended recipient, and the healthcares system. Potential strategies to improve the efficiency of this process include eliminating unnecessary or redundant tests, evaluating multiple donor candidates simultaneously, performing 1-day evaluations, and promoting earlier recipient referrals
    corecore