2 research outputs found

    Risk stratification of cardiovascular patients using a novel classification tree induction algorithm with non-symmetric entropy measures

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-100).Risk stratification allows clinicians to choose treatments consistent with a patient's risk profile. Risk stratification models that integrate information from several risk attributes can aid clinical decision making. One of the technical challenges in developing risk stratification models from medical data is the class imbalance problem. Typically the number of patients that experience a serious medical event is a small subset of the entire population. The goal of my thesis work is to develop automated tools to build risk stratification models that can handle unbalanced datasets and improve risk stratification. We propose a novel classification tree induction algorithm that uses non-symmetric entropy measures to construct classification trees. We apply our methods to the application of identifying patients at high risk of cardiovascular mortality. We tested our approach on a set of 4200 patients who had recently suffered from a non-ST-elevation acute coronary syndrome. When compared to classification tree models generated using other measures proposed in the literature, the tree models constructed using non-symmetric entropy had higher recall and precision. Our models significantly outperformed models generated using logistic regression - a standard method of developing multivariate risk stratification models in the literature.by Anima Singh.S.M

    Using Resampling Techniques for Better Quality Discretization

    No full text
    mcdm09qzMany supervised induction algorithms require discrete data, however real data often comes in both discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on accuracy, complexity, variance and understandability of the induction model. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce two variants of a resampling technique (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether this type of resampling can lead to better quality discretization points, which opens up a new paradigm to construction of soft decision trees
    corecore