15,047 research outputs found

    A kernelized genetic algorithm decision tree with information criteria

    Get PDF
    Decision trees are one of the most widely used data mining models with a long history in machine learning, statistics, and pattern recognition. A main advantage of the decision trees is that the resulting data partitioning model can be easily understood by both the data analyst and customer. This is in comparison to some more powerful kernel related models such as Radial Basis Function (RBF) Networks and Support Vector Machines. In recent literature, the decision tree has been used as part of a two-step training algorithm for RBF networks. However, the primary function of the decision tree is not model visualization but dividing the input data into initial potential radial basis spaces. In this dissertation, the kernel trick using Mercer\u27s condition is applied during the splitting of the input data through the guidance of a decision tree. This allows the algorithm to search for the best split using the projected feature space information while remaining in the current data space. The decision tree will capture the information of the linear split in the projected feature space and present the corresponding non-linear split of the input data space. Using a genetic search algorithm, Bozdogan\u27s Information Complexity criterion (ICOMP) performs as a fitness function to determine the best splits, control model complexity, subset input variables, and decide the optimal choice of kernel function. The decision tree is then applied to radial basis function networks in the areas of regression, nominal classification, and ordinal prediction

    Modeling Binary Time Series Using Gaussian Processes with Application to Predicting Sleep States

    Full text link
    Motivated by the problem of predicting sleep states, we develop a mixed effects model for binary time series with a stochastic component represented by a Gaussian process. The fixed component captures the effects of covariates on the binary-valued response. The Gaussian process captures the residual variations in the binary response that are not explained by covariates and past realizations. We develop a frequentist modeling framework that provides efficient inference and more accurate predictions. Results demonstrate the advantages of improved prediction rates over existing approaches such as logistic regression, generalized additive mixed model, models for ordinal data, gradient boosting, decision tree and random forest. Using our proposed model, we show that previous sleep state and heart rates are significant predictors for future sleep states. Simulation studies also show that our proposed method is promising and robust. To handle computational complexity, we utilize Laplace approximation, golden section search and successive parabolic interpolation. With this paper, we also submit an R-package (HIBITS) that implements the proposed procedure.Comment: Journal of Classification (2018

    Automated Screening for Three Inborn Metabolic Disorders: A Pilot Study

    Get PDF
    Background: Inborn metabolic disorders (IMDs) form a large group of rare, but often serious, metabolic disorders. Aims: Our objective was to construct a decision tree, based on classification algorithm for the data on three metabolic disorders, enabling us to take decisions on the screening and clinical diagnosis of a patient. Settings and Design: A non-incremental concept learning classification algorithm was applied to a set of patient data and the procedure followed to obtain a decision on a patient’s disorder. Materials and Methods: Initially a training set containing 13 cases was investigated for three inborn errors of metabolism. Results: A total of thirty test cases were investigated for the three inborn errors of metabolism. The program identified 10 cases with galactosemia, another 10 cases with fructosemia and the remaining 10 with propionic acidemia. The program successfully identified all the 30 cases. Conclusions: This kind of decision support systems can help the healthcare delivery personnel immensely for early screening of IMDs

    Encrypted statistical machine learning: new privacy preserving methods

    Full text link
    We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.Comment: 39 page
    • …
    corecore