403 research outputs found

    Predicting cervical cancer biopsy results using demographic and epidemiological parameters: a custom stacked ensemble machine learning approach

    Get PDF
    The human papillomavirus (HPV) is responsible for most cervical cancer cases worldwide. This gynecological carcinoma causes many deaths, even though it can be treated by removing malignant tissues at a preliminary stage. In many developing countries, patients do not undertake medical examinations due to the lack of awareness, hospital resources and high testing costs. Hence, it is vital to design a computer aided diagnostic method which can screen cervical cancer patients. In this research, we predict the probability risk of contracting this deadly disease using a custom stacked ensemble machine learning approach. The technique combines the results of several machine learning algorithms on multiple levels to produce reliable predictions. In the beginning, a deep exploratory analysis is conducted using univariate and multivariate statistics. Later, the one-way ANOVA, mutual information and Pearson’s correlation techniques are utilized for feature selection. Since the data was imbalanced, the Borderline-SMOTE technique was used to balance the data. The final stacked machine learning model obtained an accuracy, precision, recall, F1-score, area under curve (AUC) and average precision of 98%, 97%, 99%, 98%, 100% and 100%, respectively. To make the model explainable and interpretable to clinicians, explainable artificial intelligence algorithms such as Shapley additive values (SHAP), local interpretable model agnostic explanation (LIME), random forest and ELI5 have been effectively utilized. The optimistic results indicate the potential of automated frameworks to assist doctors and medical professionals in diagnosing and screening potential cervical cancer patients

    Subgrouping factors influencing migraine intensity in women: A semi-automatic methodology based on machine learning and information geometry

    Full text link
    This is the peer reviewed version of the following article: Pérez-Benito, F.J., Conejero, J.A., Sáez, C., García-Gómez, J.M., Navarro-Pardo, E., Florencio, L.L. and Fernández-de-las-Peñas, C. (2020), Subgrouping Factors Influencing Migraine Intensity in Women: A Semi-automatic Methodology Based on Machine Learning and Information Geometry. Pain Pract, 20: 297-309, which has been published in final form at https://doi.org/10.1111/papr.12854. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.[EN] Background Migraine is a heterogeneous condition with multiple clinical manifestations. Machine learning algorithms permit the identification of population groups, providing analytical advantages over other modeling techniques. Objective The aim of this study was to analyze critical features that permit the differentiation of subgroups of patients with migraine according to the intensity and frequency of attacks by using machine learning algorithms. Methods Sixty-seven women with migraine participated. Clinical features of migraine, related disability (Migraine Disability Assessment Scale), anxiety/depressive levels (Hospital Anxiety and Depression Scale), anxiety state/trait levels (State-Trait Anxiety Inventory), and pressure pain thresholds (PPTs) over the temporalis, neck, second metacarpal, and tibialis anterior were collected. Physical examination included the flexion-rotation test, cervical range of cervical motion, forward head position while sitting and standing, passive accessory intervertebral movements (PAIVMs) with headache reproduction, and joint positioning sense error. Subgrouping was based on machine learning algorithms by using the nearest neighbors algorithm, multisource variability assessment, and random forest model. Results For migraine intensity, group 2 (women with a regular migraine headache intensity score of 7 on an 11-point Numeric Pain Rating Scale [where 0 = no pain and 10 = maximum pain]) were younger and had lower joint positioning sense error in cervical rotation, greater cervical mobility in rotation and flexion, lower flexion-rotation test scores, positive PAIVMs reproducing migraine, normal PPTs over the tibialis anterior, shorter migraine history, and lower cranio-vertebral angles while standing than the remaining migraine intensity subgroups. The most discriminative variable was the flexion-rotation test score of the symptomatic side. For migraine frequency, no model was able to identify differences between groups (ie, patients with episodic or chronic migraine). Conclusions A subgroup of women with migraine who had common migraine intensity was identified with machine learning algorithms.Perez-Benito, FJ.; Conejero, JA.; Sáez Silvestre, C.; Garcia-Gomez, JM.; Navarro-Pardo, E.; Florencio, LL.; Fernández-De-Las-Peñas, C. (2020). Subgrouping factors influencing migraine intensity in women: A semi-automatic methodology based on machine learning and information geometry. Pain Practice. 20(3):297-309. https://doi.org/10.1111/papr.12854S29730920

    Healthcare data heterogeneity and its contribution to machine learning performance

    Full text link
    Tesis por compendio[EN] The data quality assessment has many dimensions, from those so obvious as the data completeness and consistency to other less evident such as the correctness or the ability to represent the target population. In general, it is possible to classify them as those produced by an external effect, and those that are inherent in the data itself. This work will be focused on those inherent to data, such as the temporal and the multisource variability applied to healthcare data repositories. Every process is usually improved over time, and that has a direct impact on the data distribution. Similarly, how a process is executed in different sources may vary due to many factors, such as the diverse interpretation of standard protocols by human beings or different previous experiences of experts. Artificial Intelligence has become one of the most widely extended technological paradigms in almost all the scientific and industrial fields. Advances not only in models but also in hardware have led to their use in almost all areas of science. Although the solved problems using this technology often have the drawback of not being interpretable, or at least not as much as other classical mathematical or statistical techniques. This motivated the emergence of the "explainable artificial intelligence" concept, that study methods to quantify and visualize the training process of models based on machine learning. On the other hand, real systems may often be represented by large networks (graphs), and one of the most relevant features in such networks is the community or clustering structure. Since sociology, biology, or clinical situations could usually be modeled using graphs, community detection algorithms are becoming more and more extended in a biomedical field. In the present doctoral thesis, contributions have been made in the three above mentioned areas. On the one hand, temporal and multisource variability assessment methods based on information geometry were used to detect variability in data distribution that may hinder data reuse and, hence, the conclusions which can be extracted from them. This methodology's usability was proved by a temporal variability analysis to detect data anomalies in the electronic health records of a hospital over 7 years. Besides, it showed that this methodology could have a positive impact if it applied previously to any study. To this end, firstly, we extracted the variables that highest influenced the intensity of headache in migraine patients using machine learning techniques. One of the principal characteristics of machine learning algorithms is its capability of fitting the training set. In those datasets with a small number of observations, the model can be biased by the training sample. The observed variability, after the application of the mentioned methodology and considering as sources the registries of migraine patients with different headache intensity, served as evidence for the truthfulness of the extracted features. Secondly, such an approach was applied to measure the variability among the gray-level histograms of digital mammographies. We demonstrated that the acquisition device produced the observed variability, and after defining an image preprocessing step, the performance of a deep learning model, which modeled a marker of breast cancer risk estimation, increased. Given a dataset containing the answers to a survey formed by psychometric scales, or in other words, questionnaires to measure psychologic factors, such as depression, cope, etcetera, two deep learning architectures that used the data structure were defined. Firstly, we designed a deep learning architecture using the conceptual structure of such psychometric scales. This architecture was trained to model the happiness degree of the participants, improved the performance compared to classical statistical approaches. A second architecture, automatically designed using community detection in graphs, was not only a contribution[ES] El análisis de la calidad de los datos abarca muchas dimensiones, desde aquellas tan obvias como la completitud y la coherencia, hasta otras menos evidentes como la correctitud o la capacidad de representar a la población objetivo. En general, es posible clasificar estas dimensiones como las producidas por un efecto externo y las que son inherentes a los propios datos. Este trabajo se centrará en la evaluación de aquellas inherentes a los datos en repositorios de datos sanitarios, como son la variabilidad temporal y multi-fuente. Los procesos suelen evolucionar con el tiempo, y esto tiene un impacto directo en la distribución de los datos. Análogamente, la subjetividad humana puede influir en la forma en la que un mismo proceso, se ejecuta en diferentes fuentes de datos, influyendo en su cuantificación o recogida. La inteligencia artificial se ha convertido en uno de los paradigmas tecnológicos más extendidos en casi todos los campos científicos e industriales. Los avances, no sólo en los modelos sino también en el hardware, han llevado a su uso en casi todas las áreas de la ciencia. Es cierto que, los problemas resueltos mediante esta tecnología, suelen tener el inconveniente de no ser interpretables, o al menos, no tanto como otras técnicas de matemáticas o de estadística clásica. Esta falta de interpretabilidad, motivó la aparición del concepto de "inteligencia artificial explicable", que estudia métodos para cuantificar y visualizar el proceso de entrenamiento de modelos basados en aprendizaje automático. Por otra parte, los sistemas reales pueden representarse a menudo mediante grandes redes (grafos), y una de las características más relevantes de esas redes, es la estructura de comunidades. Dado que la sociología, la biología o las situaciones clínicas, usualmente pueden modelarse mediante grafos, los algoritmos de detección de comunidades se están extendiendo cada vez más en el ámbito biomédico. En la presente tesis doctoral, se han hecho contribuciones en los tres campos anteriormente mencionados. Por una parte, se han utilizado métodos de evaluación de variabilidad temporal y multi-fuente, basados en geometría de la información, para detectar la variabilidad en la distribución de los datos que pueda dificultar la reutilización de los mismos y, por tanto, las conclusiones que se puedan extraer. Esta metodología demostró ser útil tras ser aplicada a los registros electrónicos sanitarios de un hospital a lo largo de 7 años, donde se detectaron varias anomalías. Además, se demostró el impacto positivo que este análisis podría añadir a cualquier estudio. Para ello, en primer lugar, se utilizaron técnicas de aprendizaje automático para extraer las características más relevantes, a la hora de clasificar la intensidad del dolor de cabeza en pacientes con migraña. Una de las propiedades de los algoritmos de aprendizaje automático es su capacidad de adaptación a los datos de entrenamiento, en bases de datos en los que el número de observaciones es pequeño, el estimador puede estar sesgado por la muestra de entrenamiento. La variabilidad observada, tras la utilización de la metodología y considerando como fuentes, los registros de los pacientes con diferente intensidad del dolor, sirvió como evidencia de la veracidad de las características extraídas. En segundo lugar, se aplicó para medir la variabilidad entre los histogramas de los niveles de gris de mamografías digitales. Se demostró que esta variabilidad estaba producida por el dispositivo de adquisición, y tras la definición de un preproceso de imagen, se mejoró el rendimiento de un modelo de aprendizaje profundo, capaz de estimar un marcador de imagen del riesgo de desarrollar cáncer de mama. Dada una base de datos que recogía las respuestas de una encuesta formada por escalas psicométricas, o lo que es lo mismo cuestionarios que sirven para medir un factor psicológico, tales como depresión, resiliencia, etc., se definieron nuevas arquitecturas de aprendizaje profundo utilizando la estructura de los datos. En primer lugar, se dise˜no una arquitectura, utilizando la estructura conceptual de las citadas escalas psicom´etricas. Dicha arquitectura, que trataba de modelar el grado de felicidad de los participantes, tras ser entrenada, mejor o la precisión en comparación con otros modelos basados en estadística clásica. Una segunda aproximación, en la que la arquitectura se diseño de manera automática empleando detección de comunidades en grafos, no solo fue una contribución de por sí por la automatización del proceso, sino que, además, obtuvo resultados comparables a su predecesora.[CA] L'anàlisi de la qualitat de les dades comprén moltes dimensions, des d'aquelles tan òbvies com la completesa i la coherència, fins a altres menys evidents com la correctitud o la capacitat de representar a la població objectiu. En general, és possible classificar estes dimensions com les produïdes per un efecte extern i les que són inherents a les pròpies dades. Este treball se centrarà en l'avaluació d'aquelles inherents a les dades en reposadors de dades sanitaris, com són la variabilitat temporal i multi-font. Els processos solen evolucionar amb el temps i açò té un impacte directe en la distribució de les dades. Anàlogament, la subjectivitat humana pot influir en la forma en què un mateix procés, s'executa en diferents fonts de dades, influint en la seua quantificació o arreplega. La intel·ligència artificial s'ha convertit en un dels paradigmes tecnològics més estesos en quasi tots els camps científics i industrials. Els avanços, no sols en els models sinó també en el maquinari, han portat al seu ús en quasi totes les àrees de la ciència. És cert que els problemes resolts per mitjà d'esta tecnologia, solen tindre l'inconvenient de no ser interpretables, o almenys, no tant com altres tècniques de matemàtiques o d'estadística clàssica. Esta falta d'interpretabilitat, va motivar l'aparició del concepte de "inteligencia artificial explicable", que estudia mètodes per a quantificar i visualitzar el procés d'entrenament de models basats en aprenentatge automàtic. D'altra banda, els sistemes reals poden representar-se sovint per mitjà de grans xarxes (grafs) i una de les característiques més rellevants d'eixes xarxes, és l'estructura de comunitats. Atés que la sociologia, la biologia o les situacions clíniques, poden modelar-se usualment per mitjà de grafs, els algoritmes de detecció de comunitats s'estan estenent cada vegada més en l'àmbit biomèdic. En la present tesi doctoral, s'han fet contribucions en els tres camps anteriorment mencionats. D'una banda, s'han utilitzat mètodes d'avaluació de variabilitat temporal i multi-font, basats en geometria de la informació, per a detectar la variabilitat en la distribució de les dades que puga dificultar la reutilització dels mateixos i, per tant, les conclusions que es puguen extraure. Esta metodologia va demostrar ser útil després de ser aplicada als registres electrònics sanitaris d'un hospital al llarg de 7 anys, on es van detectar diverses anomalies. A més, es va demostrar l'impacte positiu que esta anàlisi podria afegir a qualsevol estudi. Per a això, en primer lloc, es van utilitzar tècniques d'aprenentatge automàtic per a extraure les característiques més rellevants, a l'hora de classificar la intensitat del mal de cap en pacients amb migranya. Una de les propietats dels algoritmes d'aprenentatge automàtic és la seua capacitat d'adaptació a les dades d'entrenament, en bases de dades en què el nombre d'observacions és xicotet, l'estimador pot estar esbiaixat per la mostra d'entrenament. La variabilitat observada després de la utilització de la metodologia, i considerant com a fonts els registres dels pacients amb diferent intensitat del dolor, va servir com a evidència de la veracitat de les característiques extretes. En segon lloc, es va aplicar per a mesurar la variabilitat entre els histogrames dels nivells de gris de mamografies digitals. Es va demostrar que esta variabilitat estava produïda pel dispositiu d'adquisició i després de la definició d'un preprocés d'imatge, es va millorar el rendiment d'un model d'aprenentatge profund, capaç d'estimar un marcador d'imatge del risc de desenrotllar càncer de mama. Donada una base de dades que arreplegava les respostes d'una enquesta formada per escales psicomètriques, o el que és el mateix qüestionaris que servixen per a mesurar un factor psicològic, com ara depressió, resiliència, etc., es van definir noves arquitectures d'aprenentatge profund utilitzant l’estructura de les dades. En primer lloc, es disseny`a una arquitectura, utilitzant l’estructura conceptual de les esmentades escales psicom`etriques. La dita arquitectura, que tractava de modelar el grau de felicitat dels participants, despr´es de ser entrenada, va millorar la precisió en comparació amb altres models basats en estad´ıstica cl`assica. Una segona aproximació, en la que l’arquitectura es va dissenyar de manera autoàtica emprant detecció de comunitats en grafs, no sols va ser una contribució de per si per l’automatització del procés, sinó que, a més, va obtindre resultats comparables a la seua predecessora.También me gustaría mencionar al Instituto Tecnológico de la Informáica, en especial al grupo de investigación Percepción, Reconocimiento, Aprendizaje e Inteligencia Artificial, no solo por darme la oportunidad de seguir creciendo en el mundo de la ciencia, sino también, por apoyarme en la consecución de mis objetivos personalesPérez Benito, FJ. (2020). Healthcare data heterogeneity and its contribution to machine learning performance [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/154414TESISCompendi

    Deep Learning in Medical Image Analysis

    Get PDF
    The accelerating power of deep learning in diagnosing diseases will empower physicians and speed up decision making in clinical environments. Applications of modern medical instruments and digitalization of medical care have generated enormous amounts of medical images in recent years. In this big data arena, new deep learning methods and computational models for efficient data processing, analysis, and modeling of the generated data are crucially important for clinical applications and understanding the underlying biological process. This book presents and highlights novel algorithms, architectures, techniques, and applications of deep learning for medical image analysis

    Improving Providers’ Survival Estimates and Selection of Prognosis- and Guidelines-Appropriate Radiotherapy Regimens for Patients with Symptomatic Bone Metastases: Development and Evaluation of the BMETS Model and Decision Support Platform

    Get PDF
    In the management of symptomatic bone metastases, selection of appropriate palliative radiotherapy (RT) regimens should be based on patient-specific characteristics including estimated survival time. Yet, provider predictions of patient survival are notoriously inaccurate. Moreover, available evidence- and consensus-based guidelines do not provide clear criteria for selecting between the range of palliative RT regimens available. In an effort to improve selection of prognosis- and guidelines-appropriate palliative bone treatments, we developed the Bone Metastases Ensemble Trees for Survival (BMETS) model. Built using an institutional database of 397 patients seen in consultation for symptomatic bone metastases, this machine-learning model estimates survival time following RT consultation using 27 prognostic covariates. Cross validations procedures revealed excellent discrimination for survival, and the BMETS outperformed validated, simpler statistical models, justifying its use in this population. To better characterize a component of decisional uncertainty faced by providers, we next sought to identify the prevalence of “complicated” symptomatic bone metastases across a breadth of possible operational definitions. Our efforts identified up to 96 possible definitions of “complicated” bone metastases, present in up to 67.1% of patients in our database. Given that such “complicated” lesions may have been excluded from clinical trials in this setting, these data highlight the difficulty faced by providers when attempting to select appropriate RT regimens using inadequately defined selection criteria. Informed by these insights, we developed the BMETS Decision Support Platform (BMETS-DSP). This provider-facing, web-based tool was created to (1) collect relevant patient-specific data, (2) display an individualized predicted survival curve as per the BMETS model, and (3) provide case-specific, evidence-based recommendations for treatment of symptomatic bone metastases. We then conducted a pilot assessment of the clinical utility of the BMETS-DSP. In this preliminary assessment, the BMETS-DSP significantly improved physician accuracy in estimating survival and increased prognostic confidence, likelihood of sharing prognosis, and use of prognosis-appropriate RT regimens in the care of case patients. Collectively, this research provides early justification for the use of a machine-learning survival model and resultant decisions support platform to guide individualized selection of palliative RT regimens for symptomatic bone metastases. These data support a multi-institutional, randomized trial of the BMETS-DSP

    Topics on Cervical Cancer With an Advocacy for Prevention

    Get PDF
    Cervical Cancer is one of the leading cancers among women, especially in developing countries. Prevention and control are the most important public health strategies. Empowerment of women, education, "earlier" screening by affordable technologies like visual inspection, and treatment of precancers by cryotherapy/ LEEP are the most promising interventions to reduce the burden of cervical cancer.Dr Rajamanickam Rajkumar had the privilege of establishing a rural population based cancer registry in South India in 1996, as well as planning and implementing a large scale screening program for cervical cancer in 2000. The program was able to show a reduction in the incidence rate of cervical cancer by 25%, and reduction in mortality rate by 35%. This was the greatest inspiration for him to work on cerrvical cancer prevention, and he edited this book to inspire others to initiate such programs in developing countries. InTech - Open Access Publisher plays a major role in this crusade against cancer, and the authors have contributed to it very well
    corecore