965 research outputs found

    Totally Corrective Multiclass Boosting with Binary Weak Learners

    Full text link
    In this work, we propose a new optimization framework for multiclass boosting learning. In the literature, AdaBoost.MO and AdaBoost.ECC are the two successful multiclass boosting algorithms, which can use binary weak learners. We explicitly derive these two algorithms' Lagrange dual problems based on their regularized loss functions. We show that the Lagrange dual formulations enable us to design totally-corrective multiclass algorithms by using the primal-dual optimization technique. Experiments on benchmark data sets suggest that our multiclass boosting can achieve a comparable generalization capability with state-of-the-art, but the convergence speed is much faster than stage-wise gradient descent boosting. In other words, the new totally corrective algorithms can maximize the margin more aggressively.Comment: 11 page

    Real-Time Induction Motor Health Index Prediction in A Petrochemical Plant using Machine Learning

    Get PDF
    This paper presents real-time health prediction of induction motors (IMs) utilised in a petrochemical plant through the application of intelligent sensors and machine learning (ML) models. At present, maintenance engineers of the company implement time-based and condition-based maintenance techniques in periodically examining and diagnosing the health of IMs which results in sporadic breakdowns of IMs. Such breakdowns sometimes force the entire production process to stop for emergency maintenance resulting in a huge loss in the companyā€™s revenue. Hence, top management decides to switch the operational practice to real-time predictive maintenance instead. Intelligent sensors are installed on IMs to collect necessary information related to their working statuses. ML exploits the real-time information received from intelligent sensors to flag abnormalities of mechanical or electrical components of IMs before potential failures are reached. Four ML models are investigated to evaluate which one is the best, i.e. Artificial Neural Network (ANN), Particle Swarm Optimization (PSO), Gradient Boosting Tree (GBT) and Random Forest (RF). Standard performance metrics are used to compare the relative effectiveness among different ML models including Precision, Recall, Accuracy, F1-score, and AUC-ROC curve. The results reveal that PSO not only obtains the highest average weighted Accuracy but also can differentiate the statuses (Class 0 ā€“ Class 3) of the IM more correctly than other counterpart models

    Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies

    Full text link
    Lung cancer is among the most common cancers in the United States, in terms of incidence and mortality. In 2009, it is estimated that more than 150,000 deaths will result from lung cancer alone. Genetic information is an extremely valuable data source in characterizing the personal nature of cancer. Over the past several years, investigators have conducted numerous association studies where intensive genetic data is collected on relatively few patients compared to the numbers of gene predictors, with one scientific goal being to identify genetic features associated with cancer recurrence or survival. In this note, we propose high-dimensional survival analysis through a new application of boosting, a powerful tool in machine learning. Our approach is based on an accelerated lifetime model and minimizing the sum of pairwise differences in residuals. We apply our method to a recent microarray study of lung adenocarcinoma and find that our ensemble is composed of 19 genes, while a proportional hazards (PH) ensemble is composed of nine genes, a proper subset of the 19-gene panel. In one of our simulation scenarios, we demonstrate that PH boosting in a misspecified model tends to underfit and ignore moderately-sized covariate effects, on average. Diagnostic analyses suggest that the PH assumption is not satisfied in the microarray data and may explain, in part, the discrepancy in the sets of active coefficients. Our simulation studies and comparative data analyses demonstrate how statistical learning by PH models alone is insufficient.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS426 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A low variance error boosting algorithm

    Get PDF
    This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered
    • ā€¦
    corecore