23 research outputs found

    EnsembleSVM: A Library for Ensemble Learning Using Support Vector Machines

    Full text link
    EnsembleSVM is a free software package containing efficient routines to perform ensemble learning with support vector machine (SVM) base models. It currently offers ensemble methods based on binary SVM models. Our implementation avoids duplicate storage and evaluation of support vectors which are shared between constituent models. Experimental results show that using ensemble approaches can drastically reduce training complexity while maintaining high predictive accuracy. The EnsembleSVM software package is freely available online at http://esat.kuleuven.be/stadius/ensemblesvm.Comment: 5 pages, 1 tabl

    The Effects of Combined Exposure to Simulated Microgravity, Ionizing Radiation, and Cortisol on the In Vitro Wound Healing Process

    Get PDF
    Human spaceflight is associated with several health-related issues as a result of long-term exposure to microgravity, ionizing radiation, and higher levels of psychological stress. Frequent reported skin problems in space include rashes, itches, and a delayed wound healing. Access to space is restricted by financial and logistical issues; as a consequence, experimental sample sizes are often small, which limits the generalization of the results. Earth-based simulation models can be used to investigate cellular responses as a result of exposure to certain spaceflight stressors. Here, we describe the development of an in vitro model of the simulated spaceflight environment, which we used to investigate the combined effect of simulated microgravity using the random positioning machine (RPM), ionizing radiation, and stress hormones on the wound-healing capacity of human dermal fibroblasts. Fibroblasts were exposed to cortisol, after which they were irradiated with different radiation qualities (including X-rays, protons, carbon ions, and iron ions) followed by exposure to simulated microgravity using a random positioning machine (RPM). Data related to the inflammatory, proliferation, and remodeling phase of wound healing has been collected. Results show that spaceflight stressors can interfere with the wound healing process at any phase. Moreover, several interactions between the different spaceflight stressors were found. This highlights the complexity that needs to be taken into account when studying the effect of spaceflight stressors on certain biological processes and for the aim of countermeasures development

    Machine Learning on Belgian Health Expenditure Data: Data-driven Screening for Type 2 Diabetes

    No full text
    Diabetes mellitus is a metabolic disorder characterized by chronic hyperglycemia, which may cause serious harm to many of the body's systems. Diabetes is a deadly pandemic which presents a significant burden on healthcare systems worldwide, and will continue to do so as its global prevalence rises rapidly (particularly type 2 diabetes). In developed countries, the rising prevalence is primarily driven by population aging, lifestyle changes and greater longevity of diabetes patients. Diabetes can be managed effectively when detected early. Unfortunately, early detection proves difficult as the time between onset and clinical diagnosis may span several years. Furthermore, estimates indicate that over one third of diabetes patients in developed countries are undiagnosed. We investigated the potential of Belgian health expenditure data as a basis to build a cost-effective population-wide screening approach for (type 2) diabetes mellitus, aspiring to improve secondary prevention by speeding up the diagnosis of patients in order to initiate treatment before the disease has caused irrevocable damage. We used health expenditure data collected by the National Alliance of Christian Mutualities - the largest social health insurer in Belgium. This data comprises basic biographic information and records of all refunded medical interventions and drug purchases, thus providing a long-term longitudinal overview of over 4 million individuals' medical expenditure histories. Screening was formulated as a binary classification task, in which diabetes patients represent the positive class. Due to the nature of the problem and limitations of health expenditure data, we were unable to identify a set of known negatives (patients without diabetes). Hence, we had to learn classifiers from positive and unlabeled data. During this project we made two contributions to this subdomain of semi-supervised learning: (i) a novel learning method which is robust to false positives and (ii) an approach to evaluate classifiers using traditional metrics without known negatives in the test set. Additionally, we mapped the survival of patients starting various antidiabetic pharmacotherapies and developed two open-source machine learning packages: one for ensemble learning and another to automate hyperparameter search. We built a screening method with competitive performance to existing state-of-the-art approaches. This exceeded our expectations, since health expenditure data omits most info about the typical risk factors used by other screening methods (BMI, lifestyle, genetic predisposition, ...). As such, the combination of health expenditure data and additional information about risk factors is a promising avenue for future research in screening for diabetes mellitus. Finally, our approach has a very low operational cost since we only used readily-available data, which effectively removes one of the key barriers of population-wide screening for diabetes.Abstract v Contents xiii List of Figures xxi List of Tables xxv 1 Introduction 1.1 Diabetes mellitus 1.2 Early detection and intervention in type 2 diabetes 1.2.1 Diagnosis of diabetes 1.2.2 Existing screening and prescreening approaches 1.2.3 Situation in Belgium 1.3 Belgian mutual health insurance 1.3.1 Data related to medical interventions 1.3.2 Data related to drug purchases 1.3.3 Quality of health expenditure data 1.4 Machine learning challenges and contributions 1.4.1 Learning from positive and unlabeled data 1.4.2 Automated hyperparameter optimization 1.4.3 Open-source software 1.5 Structure of the thesis Appendices 1.A Regulation of blood glucose levels 1.B Complications and comorbidities of diabetes 1.C Classification of diabetes mellitus 1.D Prevalence and burden of diabetes 1.E Treatment of diabetes mellitus 2 Mortality in individuals treated with glucose lowering agents: a large, controlled cohort study 2.1 Introduction 2.2 Research design and methods 2.2.1 Study cohort selection 2.2.2 Control cohort selection 2.2.3 Therapy changes within cohorts 2.2.4 Censoring 2.2.5 Statistical analysis 2.3 Results 2.3.1 Baseline cohort characteristics 2.3.2 Five-year survival in individuals on different glucose lowering agents 2.3.3 Age-dependent 5-year survival of individuals on different glucose lowering agents 2.3.4 Statins and survival in individuals on different glucose lowering therapy 2.4 Conclusions 3 EnsembleSVM: A Library for Ensemble Learning Using SVMs 3.1 Introduction 3.2 Software Description 3.2.1 Implementation 3.2.2 Tools 3.3 Benchmark Results 3.4 Conclusions 4 SVM Ensemble Learning from Positive and Unlabeled Data 4.1 Introduction 4.2 Related work 4.2.1 Class-weighted SVM 4.2.2 Bagging SVM 4.3 Robust Ensemble of SVMs 4.3.1 Bootstrap resampling contaminated sets 4.3.2 Bagging predictors 4.3.3 Justification of the RESVM algorithm 4.3.4 RESVM training 4.3.5 RESVM prediction 4.4 Experimental setup 4.4.1 Simulation setup 4.4.2 Data sets 4.5 Results and discussion 4.5.1 Results for supervised classification 4.5.2 Results for PU learning 4.5.3 Results of semi-supervised classification 4.5.4 A note on the number of repetitions per experiment 4.5.5 Trend across data sets 4.5.6 Effect of contamination 4.5.7 RESVM optimal parameters 4.6 Conclusion 5 Hyperparameter Search in Machine Learning 5.1 Introduction 5.1.1 Example: controlling model complexity 5.1.2 Formalizing hyperparameter search 5.2 Challenges in hyperparameter search 5.2.1 Costly objective function evaluations 5.2.2 Randomness 5.2.3 Complex search spaces 5.3 Current approaches 5.4 Conclusion 6 Easy Hyperparameter Search Using Optunity 6.1 Introduction 6.2 Optunity 6.2.1 Functional Overview 6.2.2 Available Solvers 6.2.3 Software Design and Implementation 6.2.4 Development and Documentation 6.3 Related Work 6.4 Solver Benchmark Appendices 6.A Survey of hyperparameter optimization in NIPS 2014 6.B Performance benchmark 6.B.1 Setup 6.B.2 Results & Discussion 7 Assessing Binary Classifiers Using Only Positive and Unlabeled Data 7.1 Introduction 7.2 Background and definitions 7.2.1 Rank distributions and contingency tables 7.2.2 ROC and PR curves 7.2.3 Evaluation with partially labeled data 7.3 Relationship between the rank CDF of positives and contingency tables 7.3.1 Rank distributions and contingency tables based on subsets of positives within a ranking 7.3.2 Contingency tables based on partially labeled data 7.4 Efficiently computing the bounds 7.4.1 Computing the contingency table with greatest-lower bound on FPR at given rank r 7.4.2 Bounds on the rank distribution of P U 7.5 Constructing ROC and PR curve estimates 7.6 Discussion and Recommendations 7.6.1 Determining betahat and its effect 7.6.2 Model selection 7.6.3 Empirical quality of the estimates 7.6.4 Relative importance of known negatives compared to known positives 7.7 Conclusion Appendices 7.A Effect of betahat on contingency table entries and common performance metrics 7.B The effect of the fraction of known positives, known negatives and betahat 8 Building Classifiers to Predict the Start of Glucose-Lowering Pharmacotherapy Using Belgian Health Expenditure Data 8.1 Introduction 8.2 Existing Type 2 Diabetes Risk Profiling Approaches 8.3 Health Expenditure Data 8.3.1 Records Related to Drug Purchases 8.3.2 Records Related to Medical Provisions 8.3.3 Advantages of Health Expenditure Data 8.3.4 Limitations of Health Expenditure Data Methods 8.4.1 Experimental Setup 8.4.2 Data Set Construction 8.4.3 Learning Methods 8.5 Results and Discussion 8.5.1 Benchmark of learning methods 8.5.2 Performance Curves 8.5.3 Feature Importance Analysis for the RESVM Model 8.6 Conclusion 9 Conclusion 9.1 Machine learning contributions 9.1.1 Future work 9.2 Screening for type 2 diabetes 9.2.1 Weaknesses and limitations of our approach 9.2.2 Future work 9.2.3 Health expenditure data 9.2.4 The elephant in the room Bibliography List of publicationsWork done in collaboration with Landsbond der Christelijke Mutualiteiten.nrpages: 220status: publishe

    Hyperparameter search in machine learning

    No full text
    status: publishe
    corecore