7 research outputs found

    Breast cancer data analysis for survivability studies and prediction

    Full text link
    © 2017 Elsevier B.V. Background Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients. Objective The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties. Methods Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset available from SEER program is used in this study to identify patterns associated with the survivability of breast cancer patients. Information gain was computed for the purpose of variable selection. All of these methods are data-driven and require little (if any) input from users or experts. Results SOM consolidated patients into cohorts of patients with similar properties. From this, DBSCAN identified and extracted nine cohorts (clusters). It is found that patients in each of the nine clusters have different survivability time. The separation of patients into clusters improved the overall survival prediction accuracy based on MLP and revealed intricate conditions that affect the accuracy of a prediction. Conclusions A new, entirely data driven approach based on unsupervised learning methods improves understanding and helps identify patterns associated with the survivability of patient. The results of the analysis can be used to segment the historical patient data into clusters or subsets, which share common variable values and survivability. The survivability prediction accuracy of a MLP is improved by using identified patient cohorts as opposed to using raw historical data. Analysis of variable values in each cohort provide better insights into survivability of a particular subgroup of breast cancer patients

    Socio-Technical Perspective on Managing Type II Diabetes

    Get PDF
    Social attributes such as education level, family history or place of residence all place a strong role in the probability of a person developing type II diabetes later in life. The aim of this paper is to develop a knowledge system based to use social attributes to estimate the prevalence of type II diabetes in a given area in Australia to support public health policymaking. The focus of this paper is towards answering the research question How can social determinants associated with type II diabetes, be used to incrementally develop a supporting knowledge-based system (KBS)? The contribution of this paper is two folds: 1. The problem domain is analysed and a suitable KBS development framework is chosen 2. A prototype is developed and presented. Initial results with preliminary data confirm the validity of the approach

    Artículo de revisión: Sistemas predictivos del cáncer de mama basados en redes neuronales artificiales

    Get PDF
    El presente artículo buscará revisar los artículos de investigaciones realizadas sobre de sistemas predictivos del cáncer de mamá basados en redes neuronales con el fin de identificar tendencias tanto en la construcción de estos sistemas como en los países que las desarrollan. Por tal motivo, se realizará una búsqueda en las bases de datos a las que se obtuvo acceso, se procederá a plantear filtros y criterios de exclusión e inclusión para lograr obtener artículos con un estándar definido. Como resultado de lo anterior, se obtuvieron doce trabajos de investigación de los cuales se pudo afirmar que el mayor número de investigaciones provienen de la India e Inglaterra observándose una gran ausencia de los países latinoamericanos. Además, se logró tener un listado de las principales características que son utilizadas en el entrenamiento de las redes neuronales aplicadas en los diferentes sistemas informáticos. Finalmente, se espera que este articulo sirva como referencia e inspiración para investigar e implementar software predictivo en el área de salud

    Cohort profile: the MCC-Spain follow-up on colorectal, breast and prostate cancers: study design and initial results

    Get PDF
    PURPOSE: Since 2016, the multicase-control study in Spain (MCC-Spain) has focused towards the identification of factors associated with cancer prognosis. Inception cohorts of patients with colorectal, breast and prostate cancers were assembled using the incident cases originally recruited. PARTICIPANTS: 2140 new cases of colorectal cancer, 1732 of breast cancer and 1112 of prostate cancer were initially recruited in 12 Spanish provinces; all cancers were incident and pathologically confirmed. Follow-up was obtained for 2097 (98%), 1685 (97%) and 1055 (94.9%) patients, respectively. FINDINGS TO DATE: Information gathered at recruitment included sociodemographic factors, medical history, lifestyle and environmental exposures. Biological samples were obtained, and 80% of patients were genotyped using a commercial exome array. The follow-up was performed by (1) reviewing medical records; (2) interviewing the patients by phone on quality of life; and (3) verifying vital status and cause of death in the Spanish National Death Index. Ninety-seven per cent of recruited patients were successfully followed up in 2017 or 2018; patient-years of follow-up were 30 914. Most colorectal cancers (52%) were at clinical stage II or lower at recruitment; 819 patients died in the follow-up and the 5-year survival was better for women (74.4%) than men (70.0%). 71% of breast cancers were diagnosed at stages I or II; 206 women with breast cancer died in the follow-up and the 5-year survival was 90.7%. 49% of prostate cancers were diagnosed at stage II and 32% at stage III; 119 patients with prostate cancer died in the follow-up and the 5-year survival was 93.7%. FUTURE PLANS: MCC-Spain has built three prospective cohorts on highly frequent cancers across Spain, allowing to investigate socioeconomic, clinical, lifestyle, environmental and genetic variables as putative prognosis factors determining survival of patients of the three cancers and the inter-relationship of these factors

    Feasibility study on data mining techniques in diagnosis of breast cancer

    Get PDF
    © 2019 International Association of Computer Science and Information Technology. Survivability of patients suffering from breast cancer varies according to the stages. The early detection of breast cancer increase the longevity of patients. However, the number of risk factors involved in the detection exponentially increases with the medical examinations. The need for automated data mining techniques to enable cost-effective and early prediction of cancer is rapidly becoming a trend in healthcare industry. The optimal techniques for prediction and diagnosis differs significantly due to the risk factors. This study reviews article provides a holistic view of the types of data mining techniques used in prediction of breast cancer. On a whole, the computer-aided automatic data mining techniques that are commonly employed in diagnosis and prognosis of chronic diseases include Decision Tree, Naïve Bayes, Association rule, Multilayer Perceptron (MLP), Random Forest, and Support Vector Machines (SVM), among others. The accuracy and overall performance of the classifiers differ for every dataset and thereby this article attempts to provide a mean to understand the approaches involved in the early prediction

    Subtyping Chronic Kidney Disease Patients And Adiposity-Obesity Related Metabolomics Analyses: Findings From The Chronic Renal Insufficiency Cohort Study

    Get PDF
    Chronic kidney disease (CKD) is a heterogenous condition that is often complicated by multiple serious comorbidities that create a large disease burden. Concurrent with the high CKD prevalence is the epidemic of obesity which increases the risks of adverse outcomes among people with kidney dysfunction. However, due in part to patient heterogeneity, the complex relationship between obesity and CKD is not fully understood. We aim to systematically examine phenotypic heterogeneity in patients with CKD and to study CKD mechanisms related to obesity-adiposity by integrating rich clinical characteristics of patients with high-dimensional metabolomics data. 3939 participants in the prospective Chronic Renal Insufficiency Cohort (CRIC) Study with stage 2-4 CKD at baseline were included in this body of research. We conducted two parallel clustering analyses using the machine learning methods of consensus clustering. First, we examined the overall CKD heterogeneity using 72 markers of patients’ demographics, biomarkers, and commonly collected clinical characteristics. Second, we identified the adiposity-obesity-related (AOR) CKD subgroups using 22 markers of patients’ obesity attributes, adiposity parameters, and comorbidity profiles. Third, in a random subset of CRIC participants with metabolomics data, we investigated the metabolic signatures associated with AOR CKD subgroups and tested metabolites as potential mediators of the association between AOR CKD subgroups and various clinical endpoints using Aalen additive hazards models and Cox regression. Among our findings, we identified three distinct CKD subgroups from the overall clinical data, and a different set of three-level AOR CKD subgroups featured with distinct patient profiles of adiposity/obesity and diabetes. Both sets of CKD subgroups were significantly and independently associated with different rates of future clinical outcomes. The metabolomics and mediation analyses revealed numerous metabolites to be mediators of the relationship between AOR CKD subgroups and clinical endpoints. Among them, multiple lipids, nucleoside, and amino acid metabolites were identified as key markers. In summary, our work quantitatively characterized CKD patient heterogeneity, shed light on adiposity-obesity-related disease mechanisms at both phenotypic and molecular levels, and highlighted potential therapeutic targets as well as metabolomics pathways for disease management and treatment. Validation using longitudinal metabolomics data and/or independent cohorts are needed
    corecore