6 research outputs found

    Validating Synthetic Health Datasets for Longitudinal Clustering

    Get PDF
    This paper appeared at the Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2013), Adelaide, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol.142. K. Gray and A. Koronios, Eds. Reproduction for academic, not-for profit purposes permitted provided this text is included.Clustering methods partition datasets into subgroups with some homogeneous properties, with information about the number and particular characteristics of each subgroup unknown a priori. The problem of predicting the number of clusters and quality of each cluster might be overcome by using cluster validation methods. This paper presents such an approach in-corporating quantitative methods for comparison be-tween original and synthetic versions of longitudinal health datasets. The use of the methods is demon-strated by using two different clustering algorithms, K-means and Latent Class Analysis, to perform clus-tering on synthetic data derived from the 45 and Up Study baseline data, from NSW in Australia

    Constructing a Synthetic Longitudinal Health Dataset for Data Mining

    Get PDF
    Published version reproduced here with permission from the publisher.The traditional approach to epidemiological research is to analyse data in an explicit statistical fashion, attempting to answer a question or test a hypothesis. However, increasing experience in the application of data mining and exploratory data analysis methods suggests that valuable information can be obtained from large datasets using these less constrained approaches. Available data mining techniques, such as clustering, have mainly been applied to cross-sectional point-in-time data. However, health datasets often include repeated observations for individuals and so researchers are interested in following their health trajectories. This requires methods for analysis of multiple-points-over-time or longitudinal data. Here, we describe an approach to construct a synthetic longitudinal version of a major population health dataset in which clusters merge and split over time, to investigate the utility of clustering for discovering time sequence based patterns

    Comparing Data Mining with Ensemble Classification of Breast Cancer Masses in Digital Mammograms

    Get PDF
    Medical diagnosis sometimes involves detecting subtle indi-cations of a disease or condition amongst a background of diverse healthy individuals. The amount of information that is available for discover-ing such indications for mammography is large and has been growing at an exponential rate, due to population wide screening programmes. In order to analyse this information data mining techniques have been utilised by various researchers. A question that arises is: do flexible data mining techniques have comparable accuracy to dedicated classification techniques for medical diagnostic processes? This research compares a model-based data mining technique with a neural network classification technique and the improvements possible using an ensemble approach. A publicly available breast cancer benchmark database is used to determine the utility of the techniques and compare the accuracies obtained

    Incremental predictive value of screening for anxiety and depression beyond current type 2 diabetes risk models: a prospective cohort study

    No full text
    Objectives: We sought to determine whether screening for anxiety and depression, an emerging risk factor for type 2 diabetes (T2D), adds clinically meaningful information beyond current T2D risk assessment tools. Design: Prospective cohort. Participants and Setting: The 45 and Up Study is a large-scale prospective cohort of men and women aged 45 years and over, randomly sampled from the general population of New South Wales, Australia. 51 588 participants without self-reported diabetes at baseline (2006-2009) were followed up for approximately 3 years (2010). Methods: T2D status was determined by self-reported doctor who diagnosed diabetes after the age of 30 years, and/or current use of metformin. Current symptoms of anxiety and/or depression were measured by the 10-item Kessler Psychological Distress Scale (K10). We determined the optimal cut-off point for K10 for predicting T2D using Tjur's R2 and tested risk models with and without the K10 using logistic regression. We assessed performance measures for the incremental value of the K10 using the area under the receiver operating characteristic (AROC), net reclassification improvement (NRI) and net benefit (NB) decision analytics with sensitivity analyses. Results: T2D developed in 1076 individuals (52.4% men). A K10 score of ≥19 (prevalence 8.97%), adjusted for age and gender, was optimal for predicting incident T2D (sensitivity 77%, specificity 53% and positive predictive value 3%; OR 1.70 (95% CI 1.41 to 2.03, P<0.001). K10 score predicted incident T2D independent of current risk models, but did not improve corresponding AROC, NRI and NB statistics. Sensitivity analyses showed that this was partially explained by the baseline model and the small effect size of the K10 that was similar compared with other risk factors. Conclusions: Anxiety and depressing screening with the K10 adds no meaningful incremental value in addition to current T2D risk assessments. The clinical importance of anxiety and depression screening in preventing T2D requires ongoing consideration.Evan Atlantis, Shima Ghassem Pour, Federico Giros

    Comparing data mining with ensemble classification of breast cancer masses in digital mammograms

    No full text
    Verma, B ORCiD: 0000-0002-4618-0479Medical diagnosis sometimes involves detecting subtle indications of a disease or condition amongst a background of diverse healthy individuals. The amount of information that is available for discovering such indications for mammography is large and has been growing at an exponential rate, due to population wide screening programmes. In order to analyse this information data mining techniques have been utilised by various researchers. A question that arises is: do flexible data mining techniques have comparable accuracy to dedicated classification techniques for medical diagnostic processes? This research compares a model-based data mining technique with a neural network classification technique and the improvements possible using an ensemble approach. A publicly available breast cancer benchmark database is used to determine the utility of the techniques and compare the accuracies obtained

    Comparing data mining with ensemble classification of breast cancer masses in digital mammograms

    No full text
    Medical diagnosis sometimes involves detecting subtle indications of a disease or condition amongst a background of diverse healthy individuals. The amount of information that is available for discovering such indications for mammography is large and has been growing at an exponential rate, due to population wide screening programmes. In order to analyse this information data mining techniques have been utilised by various researchers. A question that arises is: do flexible data mining techniques have comparable accuracy to dedicated classification techniques for medical diagnostic processes? This research compares a model-based data mining technique with a neural network classification technique and the improvements possible using an ensemble approach. A publicly available breast cancer benchmark database is used to determine the utility of the techniques and compare the accuracies obtained
    corecore