2,562 research outputs found

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    Sensor-AssistedWeighted Average Ensemble Model for Detecting Major Depressive Disorder

    Get PDF
    The present methods of diagnosing depression are entirely dependent on self-report ratings or clinical interviews. Those traditional methods are subjective, where the individual may or may not be answering genuinely to questions. In this paper, the data has been collected using self-report ratings and also using electronic smartwatches. This study aims to develop a weighted average ensemble machine learning model to predict major depressive disorder (MDD) with superior accuracy. The data has been pre-processed and the essential features have been selected using a correlation-based feature selection method. With the selected features, machine learning approaches such as Logistic Regression, Random Forest, and the proposedWeighted Average Ensemble Model are applied. Further, for assessing the performance of the proposed model, the Area under the Receiver Optimization Characteristic Curves has been used. The results demonstrate that the proposed Weighted Average Ensemble model performs with better accuracy than the Logistic Regression and the Random Forest approaches
    • …
    corecore