18,359 research outputs found

    Classification Trees for Problems with Monotonicity Constraints

    Get PDF
    For classification problems with ordinal attributes very often theclass attribute should increase with each or some of theexplaining attributes. These are called classification problemswith monotonicity constraints. Classical decision tree algorithmssuch as CART or C4.5 generally do not produce monotone trees, evenif the dataset is completely monotone. This paper surveys themethods that have so far been proposed for generating decisiontrees that satisfy monotonicity constraints. A distinction is madebetween methods that work only for monotone datasets and methodsthat work for monotone and non-monotone datasets alike.classification tree;decision tree;monotone;monotonicity constraint;ordinal data

    Solving classification problem using ensemble binarization classifier

    Get PDF
    Binarization strategy is broadly applied in solving various multi-class classification problems. However, the classifier model learning complexity tends to increase when expanding the number of problems into several replicas. One-Versus-All (OVA) is one of the strategies which transforming the ordinal multi-class classification problems into a series of two-class classification problems. The final output from each classifier model is combined in order to produce the final prediction. This binarization strategy has been proven as superior performance in accuracy than ordinal multi-class classifier model. However, learning model complexity (eg. Random Forest-RF ensemble decision trees) tends to increase when employing a large number of trees. Even though a large number of trees might produce a decent accuracy, generating time of the learning model is significantly longer. Hence, self-tuning tree parameter is introduced to tackle this matter. In such circumstances, a number of trees in the RF classifier are defined according to the number of class problem. In this paper, the OVA with self-tuning is evaluated based on parameter initialization in the context of RF ensemble decision tree. At the same time, the performance has also been compared with two classifier models such J48 and boosting for several well-known datasets

    Ensemble methods in ordinal data classification

    Get PDF
    Problemas de classificação ordinal podem ser encontrados nas mais diversasáreas, tais como sistemas de recomendação de produtos, sistemas inteligentesde saúde e reconhecimento de imagem. Estes problemas têm como objectivoaprender a classificar uma determinada instância (e.g. um filme) numa escalaordinal (e.g. bom, médio, mau).Uma forma de melhorar o desempenho de problemas de aprendizagem supervisionada(como é o caso da classificação ordinal) é usando métodos de ensemble,onde vários modelos são combinados para tomar melhores decisões. Embora existamdiversos métodos de ensemble desenvolvidos para problemas declassificação nominal, ranking e regressão, a classificação ordinal nãotem recebido a mesma atenção.O objectivo desta dissertação é, assim, introduzir novos métodos de ensemblepara dados ordinais. Para isso, em primeiro lugar é apresentado um novo algoritmo declassificação baseado em árvores de decisão e no método de replicação dosdados, cujos resultados revelam que este pode ser vantajoso em relação a outrosclassificadores não ordinais. Depois as ideias principais deste classificadorsão aproveitadas para melhorar ensembles cujos modelos gerados possuem semelhançascom árvores de decisão (i.e. AdaBoost.M1 com Decision Stumps e Random Forests).Ordinal classification problems can be found in various areas, such as productrecommendation systems, intelligent health systems and image recognition. Thisproblems have the goal of learning how to classify certain instances (e.g a movie)in an ordinal scale (e.g. good, average, bad).The performance of supervised learned problems (such as ordinal classification)can be improved by using ensemble methods, where various models are combinedto perform better decisions. While there are various ensemble methods fornominal classification, ranking and regression, ordinal classification hasnot received the same level of attention.The goal of this dissertation is, therefore, to introduce novel ensemblemethods for the classification of ordinal data. To do this, first a new ordinalclassification algorithm based on decision trees and the data replication methodis presented, whose results show that this classifier might performbetter than other non-ordinal classifiers. Then, the main ideas of thismethod are exploited to try and improve ensembles whose modelsshare similarities with decision trees (i.e. AdaBoost.M1 with Decision Stumps andRandom Forests)

    Automated Screening for Three Inborn Metabolic Disorders: A Pilot Study

    Get PDF
    Background: Inborn metabolic disorders (IMDs) form a large group of rare, but often serious, metabolic disorders. Aims: Our objective was to construct a decision tree, based on classification algorithm for the data on three metabolic disorders, enabling us to take decisions on the screening and clinical diagnosis of a patient. Settings and Design: A non-incremental concept learning classification algorithm was applied to a set of patient data and the procedure followed to obtain a decision on a patient’s disorder. Materials and Methods: Initially a training set containing 13 cases was investigated for three inborn errors of metabolism. Results: A total of thirty test cases were investigated for the three inborn errors of metabolism. The program identified 10 cases with galactosemia, another 10 cases with fructosemia and the remaining 10 with propionic acidemia. The program successfully identified all the 30 cases. Conclusions: This kind of decision support systems can help the healthcare delivery personnel immensely for early screening of IMDs

    Encrypted statistical machine learning: new privacy preserving methods

    Full text link
    We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.Comment: 39 page
    corecore