Search CORE

34 research outputs found

Tree-based boosting with functional data

Author: Ju Xiaomeng
Salibián-Barrera Matías
Publication venue
Publication date: 05/04/2023
Field of study

In this article we propose a boosting algorithm for regression with functional explanatory variables and scalar responses. The algorithm uses decision trees constructed with multiple projections as the "base-learners", which we call "functional multi-index trees". We establish identifiability conditions for these trees and introduce two algorithms to compute them. We use numerical experiments to investigate the performance of our method and compare it with several linear and nonlinear regression estimators, including recently proposed nonparametric and semiparametric functional additive estimators. Simulation studies show that the proposed method is consistently among the top performers, whereas the performance of any competitor relative to others can vary substantially across different settings. In a real example, we apply our method to predict electricity demand using price curves and show that our estimator provides better predictions compared to its competitors, especially when one adjusts for seasonality

arXiv.org e-Print Archive

BoostFM: Boosted Factorization Machines for Top-N Feature-based Recommendation

Author: Chen Long
Guo Guibing
Jose Joemon M.
Yu Haitao
Yuan Fajie
Zhang Weinan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Feature-based matrix factorization techniques such as Factorization Machines (FM) have been proven to achieve impressive accuracy for the rating prediction task. However, most common recommendation scenarios are formulated as a top-N item ranking problem with implicit feedback (e.g., clicks, purchases)rather than explicit ratings. To address this problem, with both implicit feedback and feature information, we propose a feature-based collaborative boosting recommender called BoostFM, which integrates boosting into factorization models during the process of item ranking. Specifically, BoostFM is an adaptive boosting framework that linearly combines multiple homogeneous component recommenders, which are repeatedly constructed on the basis of the individual FM model by a re-weighting scheme. Two ways are proposed to efficiently train the component recommenders from the perspectives of both pairwise and listwise Learning-to-Rank (L2R). The properties of our proposed method are empirically studied on three real-world datasets. The experimental results show that BoostFM outperforms a number of state-of-the-art approaches for top-N recommendation

Crossref

Enlighten

Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions

Author: Hothorn Torsten
Pfahlberg Annette
Potapov Sergej
Schmid Matthias
Publication venue
Publication date: 24/11/2008
Field of study

Boosting is one of the most important methods for fitting regression models and building prediction rules from high-dimensional data. A notable feature of boosting is that the technique has a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function

Open Access LMU

Integrated Brier Score based Survival Cobra -- A regression based approach

Author: Dey Arabin Kumar
Goswami Rahul
Publication venue
Publication date: 27/10/2022
Field of study

Recently Goswami et al. \cite{goswami2022concordance} introduced two novel implementations of combined regression strategy to find the conditional survival function. The paper uses regression-based weak learners and provides an alternative version of the combined regression strategy (COBRA) ensemble using the Integrated Brier Score to predict conditional survival function. We create a novel predictor based on a weighted version of all machine predictions taking weights as a specific function of normalized Integrated Brier Score. We use two different norms (Frobenius and Sup norm) to extract the proximity points in the algorithm. Our implementations consider right-censored data too. We illustrate the proposed algorithms through some real-life data analysis.Comment: arXiv admin note: text overlap with arXiv:2209.1191

arXiv.org e-Print Archive

Preface

Author: Campadelli Paola
Choffrut Christian
Goldwurm Massimiliano
Torelli Mauro
Publication venue
Publication date: 01/01/2006
Field of study

Numérisation de Documents Anciens Mathématiques

Two Case Studies on the Effectiveness of Alternative Ensemble Methods for Machine Learning Prediction

Author: Darlington Jesse
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2019
Field of study

Digital Repository @ Iowa State University (ISU)

Explainable Software Defect Prediction from Cross Company Project Metrics Using Machine Learning

Author: Capretz Luiz Fernando
Haldar Susmita
Publication venue: Scholarship@Western
Publication date: 19/05/2023
Field of study

Predicting the number of defects in a project is critical for project test managers to allocate budget, resources, and schedule for testing, support and maintenance efforts. Software Defect Prediction models predict the number of defects in given projects after training the model with historical defect related information. The majority of defect prediction studies focused on predicting defect-prone modules from methods, and class-level static information, whereas this study predicts defects from project-level information based on a cross-company project dataset. This study utilizes software sizing metrics, effort metrics, and defect density information, and focuses on developing defect prediction models that apply various machine learning algorithms. One notable issue in existing defect prediction studies is the lack of transparency in the developed models. Consequently, the explain-ability of the developed model has been demonstrated using the state-of-the-art post-hoc model-agnostic method called Shapley Additive exPlanations (SHAP). Finally, important features for predicting defects from cross-company project information were identified

Scholarship@Western