268 research outputs found

    Prediction of dyslipidemia using gene mutations, family history of diseases and anthropometric indicators in children and adolescents: The CASPIAN-III study

    Get PDF
    Dyslipidemia, the disorder of lipoprotein metabolism resulting in high lipid profile, is an important modifiable risk factor for coronary heart diseases. It is associated with more than four million worldwide deaths per year. Half of the children with dyslipidemia have hyperlipidemia during adulthood, and its prediction and screening are thus critical. We designed a new dyslipidemia diagnosis system. The sample size of 725 subjects (age 14.66¿±¿2.61 years; 48% male; dyslipidemia prevalence of 42%) was selected by multistage random cluster sampling in Iran. Single nucleotide polymorphisms (rs1801177, rs708272, rs320, rs328, rs2066718, rs2230808, rs5880, rs5128, rs2893157, rs662799, and Apolipoprotein-E2/E3/E4), and anthropometric, life-style attributes, and family history of diseases were analyzed. A framework for classifying mixed-type data in imbalanced datasets was proposed. It included internal feature mapping and selection, re-sampling, optimized group method of data handling using convex and stochastic optimizations, a new cost function for imbalanced data and an internal validation. Its performance was assessed using hold-out and 4-foldcross-validation. Four other classifiers namely as supported vector machines, decision tree, and multilayer perceptron neural network and multiple logistic regression were also used. The average sensitivity, specificity, precision and accuracy of the proposed system were 93%, 94%, 94% and 92%, respectively in cross validation. It significantly outperformed the other classifiers and also showed excellent agreement and high correlation with the gold standard. A non-invasive economical version of the algorithm was also implemented suitable for low- and middle-income countries. It is thus a promising new tool for the prediction of dyslipidemiaPeer ReviewedPostprint (published version

    Construction of machine learning-based models for cancer outcomes in low and lower-middle income countries: A scoping review

    Get PDF
    Background: The impact and utility of machine learning (ML)-based prediction tools for cancer outcomes including assistive diagnosis, risk stratification, and adjunctive decision-making have been largely described and realized in the high income and upper-middle-income countries. However, statistical projections have estimated higher cancer incidence and mortality risks in low and lower-middle-income countries (LLMICs). Therefore, this review aimed to evaluate the utilization, model construction methods, and degree of implementation of ML-based models for cancer outcomes in LLMICs. Methods: PubMed/Medline, Scopus, and Web of Science databases were searched and articles describing the use of ML-based models for cancer among local populations in LLMICs between 2002 and 2022 were included. A total of 140 articles from 22,516 citations that met the eligibility criteria were included in this study. Results: ML-based models from LLMICs were often based on traditional ML algorithms than deep or deep hybrid learning. We found that the construction of ML-based models was skewed to particular LLMICs such as India, Iran, Pakistan, and Egypt with a paucity of applications in sub-Saharan Africa. Moreover, models for breast, head and neck, and brain cancer outcomes were frequently explored. Many models were deemed suboptimal according to the Prediction model Risk of Bias Assessment tool (PROBAST) due to sample size constraints and technical flaws in ML modeling even though their performance accuracy ranged from 0.65 to 1.00. While the development and internal validation were described for all models included (n=137), only 4.4% (6/137) have been validated in independent cohorts and 0.7% (1/137) have been assessed for clinical impact and efficacy. Conclusion: Overall, the application of ML for modeling cancer outcomes in LLMICs is increasing. However, model development is largely unsatisfactory. We recommend model retraining using larger sample sizes, intensified external validation practices, and increased impact assessment studies using randomized controlled trial design

    Predicting postoperative complications for gastric cancer patients using data mining

    Get PDF
    Gastric cancer refers to the development of malign cells that can grow in any part of the stomach. With the vast amount of data being collected daily in healthcare environments, it is possible to develop new algorithms which can support the decision-making processes in gastric cancer patients treatment. This paper aims to predict, using the CRISP-DM methodology, the outcome from the hospitalization of gastric cancer patients who have undergone surgery, as well as the occurrence of postoperative complications during surgery. The study showed that, on one hand, the RF and NB algorithms are the best in the detection of an outcome of hospitalization, taking into account patients’ clinical data. On the other hand, the algorithms J48, RF, and NB offer better results in predicting postoperative complications.FCT - Fundação para a Ciência e a Tecnologia (UID/CEC/00319/2013

    Predicting cervical cancer biopsy results using demographic and epidemiological parameters: a custom stacked ensemble machine learning approach

    Get PDF
    The human papillomavirus (HPV) is responsible for most cervical cancer cases worldwide. This gynecological carcinoma causes many deaths, even though it can be treated by removing malignant tissues at a preliminary stage. In many developing countries, patients do not undertake medical examinations due to the lack of awareness, hospital resources and high testing costs. Hence, it is vital to design a computer aided diagnostic method which can screen cervical cancer patients. In this research, we predict the probability risk of contracting this deadly disease using a custom stacked ensemble machine learning approach. The technique combines the results of several machine learning algorithms on multiple levels to produce reliable predictions. In the beginning, a deep exploratory analysis is conducted using univariate and multivariate statistics. Later, the one-way ANOVA, mutual information and Pearson’s correlation techniques are utilized for feature selection. Since the data was imbalanced, the Borderline-SMOTE technique was used to balance the data. The final stacked machine learning model obtained an accuracy, precision, recall, F1-score, area under curve (AUC) and average precision of 98%, 97%, 99%, 98%, 100% and 100%, respectively. To make the model explainable and interpretable to clinicians, explainable artificial intelligence algorithms such as Shapley additive values (SHAP), local interpretable model agnostic explanation (LIME), random forest and ELI5 have been effectively utilized. The optimistic results indicate the potential of automated frameworks to assist doctors and medical professionals in diagnosing and screening potential cervical cancer patients

    Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications

    Get PDF
    Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    Get PDF
    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    Beta Thalassemia Carriers detection empowered federated Learning

    Full text link
    Thalassemia is a group of inherited blood disorders that happen when hemoglobin, the protein in red blood cells that carries oxygen, is not made enough. It is found all over the body and is needed for survival. If both parents have thalassemia, a child's chance of getting it increases. Genetic counselling and early diagnosis are essential for treating thalassemia and stopping it from being passed on to future generations. It may be hard for healthcare professionals to differentiate between people with thalassemia carriers and those without. The current blood tests for beta thalassemia carriers are too expensive, take too long, and require too much screening equipment. The World Health Organization says there is a high death rate for people with thalassemia. Therefore, it is essential to find thalassemia carriers to act quickly. High-performance liquid chromatography (HPLC), the standard test method, has problems such as cost, time, and equipment needs. So, there must be a quick and cheap way to find people carrying the thalassemia gene. Using federated learning (FL) techniques, this study shows a new way to find people with the beta-thalassemia gene. FL allows data to be collected and processed on-site while following privacy rules, making it an excellent choice for sensitive health data. Researchers used FL to train a model for beta-thalassemia carriers by looking at the complete blood count results and red blood cell indices. The model was 92.38 % accurate at telling the difference between beta-thalassemia carriers and people who did not have the disease. The proposed FL model is better than other published methods in terms of how well it works, how reliable it is, and how private it is. This research shows a promising, quick, accurate, and low-cost way to find thalassemia carriers and opens the door for screening them on a large scale.Comment: pages 17, figures

    Comparative Analysis of Segment Anything Model and U-Net for Breast Tumor Detection in Ultrasound and Mammography Images

    Full text link
    In this study, the main objective is to develop an algorithm capable of identifying and delineating tumor regions in breast ultrasound (BUS) and mammographic images. The technique employs two advanced deep learning architectures, namely U-Net and pretrained SAM, for tumor segmentation. The U-Net model is specifically designed for medical image segmentation and leverages its deep convolutional neural network framework to extract meaningful features from input images. On the other hand, the pretrained SAM architecture incorporates a mechanism to capture spatial dependencies and generate segmentation results. Evaluation is conducted on a diverse dataset containing annotated tumor regions in BUS and mammographic images, covering both benign and malignant tumors. This dataset enables a comprehensive assessment of the algorithm's performance across different tumor types. Results demonstrate that the U-Net model outperforms the pretrained SAM architecture in accurately identifying and segmenting tumor regions in both BUS and mammographic images. The U-Net exhibits superior performance in challenging cases involving irregular shapes, indistinct boundaries, and high tumor heterogeneity. In contrast, the pretrained SAM architecture exhibits limitations in accurately identifying tumor areas, particularly for malignant tumors and objects with weak boundaries or complex shapes. These findings highlight the importance of selecting appropriate deep learning architectures tailored for medical image segmentation. The U-Net model showcases its potential as a robust and accurate tool for tumor detection, while the pretrained SAM architecture suggests the need for further improvements to enhance segmentation performance

    Machine Learning for Diabetes and Mortality Risk Prediction From Electronic Health Records

    Get PDF
    Data science can provide invaluable tools to better exploit healthcare data to improve patient outcomes and increase cost-effectiveness. Today, electronic health records (EHR) systems provide a fascinating array of data that data science applications can use to revolutionise the healthcare industry. Utilising EHR data to improve the early diagnosis of a variety of medical conditions/events is a rapidly developing area that, if successful, can help to improve healthcare services across the board. Specifically, as Type-2 Diabetes Mellitus (T2DM) represents one of the most serious threats to health across the globe, analysing the huge volumes of data provided by EHR systems to investigate approaches for early accurately predicting the onset of T2DM, and medical events such as in-hospital mortality, are two of the most important challenges data science currently faces. The present thesis addresses these challenges by examining the research gaps in the existing literature, pinpointing the un-investigated areas, and proposing a novel machine learning modelling given the difficulties inherent in EHR data. To achieve these aims, the present thesis firstly introduces a unique and large EHR dataset collected from Saudi Arabia. Then we investigate the use of a state-of-the-art machine learning predictive models that exploits this dataset for diabetes diagnosis and the early identification of patients with pre-diabetes by predicting the blood levels of one of the main indicators of diabetes and pre-diabetes: elevated Glycated Haemoglobin (HbA1c) levels. A novel collaborative denoising autoencoder (Col-DAE) framework is adopted to predict the diabetes (high) HbA1c levels. We also employ several machine learning approaches (random forest, logistic regression, support vector machine, and multilayer perceptron) for the identification of patients with pre-diabetes (elevated HbA1c levels). The models employed demonstrate that a patient's risk of diabetes/pre-diabetes can be reliably predicted from EHR records. We then extend this work to include pioneering adoption of recent technologies to investigate the outcomes of the predictive models employed by using recent explainable methods. This work also investigates the effect of using longitudinal data and more of the features available in the EHR systems on the performance and features ranking of the employed machine learning models for predicting elevated HbA1c levels in non-diabetic patients. This work demonstrates that longitudinal data and available EHR features can improve the performance of the machine learning models and can affect the relative order of importance of the features. Secondly, we develop a machine learning model for the early and accurate prediction all in-hospital mortality events for such patients utilising EHR data. This work investigates a novel application of the Stacked Denoising Autoencoder (SDA) to predict in-hospital patient mortality risk. In doing so, we demonstrate how our approach uniquely overcomes the issues associated with imbalanced datasets to which existing solutions are subject. The proposed model –– using clinical patient data on a variety of health conditions and without intensive feature engineering –– is demonstrated to achieve robust and promising results using EHR patient data recorded during the first 24 hours after admission
    corecore