34 research outputs found
Tree-based boosting with functional data
In this article we propose a boosting algorithm for regression with
functional explanatory variables and scalar responses. The algorithm uses
decision trees constructed with multiple projections as the "base-learners",
which we call "functional multi-index trees". We establish identifiability
conditions for these trees and introduce two algorithms to compute them. We use
numerical experiments to investigate the performance of our method and compare
it with several linear and nonlinear regression estimators, including recently
proposed nonparametric and semiparametric functional additive estimators.
Simulation studies show that the proposed method is consistently among the top
performers, whereas the performance of any competitor relative to others can
vary substantially across different settings. In a real example, we apply our
method to predict electricity demand using price curves and show that our
estimator provides better predictions compared to its competitors, especially
when one adjusts for seasonality
BoostFM: Boosted Factorization Machines for Top-N Feature-based Recommendation
Feature-based matrix factorization techniques such as Factorization Machines (FM) have been proven to achieve impressive accuracy for the rating prediction task. However, most common recommendation scenarios are formulated as a top-N item ranking problem with implicit feedback (e.g., clicks, purchases)rather than explicit ratings. To address this problem, with both implicit feedback and feature information, we propose a feature-based collaborative boosting recommender called BoostFM, which integrates boosting into factorization models during the process of item ranking. Specifically, BoostFM is an adaptive boosting framework that linearly combines multiple homogeneous component recommenders, which are repeatedly constructed on the basis of the individual FM model by a re-weighting scheme. Two ways are proposed to efficiently train the component recommenders from the perspectives of both pairwise and listwise Learning-to-Rank (L2R). The properties of our proposed method are empirically studied on three real-world datasets. The experimental results show that BoostFM outperforms a number of state-of-the-art approaches for top-N recommendation
Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions
Boosting is one of the most important methods for fitting
regression models and building prediction rules from
high-dimensional data. A notable feature of boosting is that the
technique has a built-in mechanism for shrinking coefficient
estimates and variable selection. This regularization mechanism
makes boosting a suitable method for analyzing data characterized by
small sample sizes and large numbers of predictors. We extend the
existing methodology by developing a boosting method for prediction
functions with multiple components. Such multidimensional functions
occur in many types of statistical models, for example in count data
models and in models involving outcome variables with a mixture
distribution. As will be demonstrated, the new algorithm is suitable
for both the estimation of the prediction function and
regularization of the estimates. In addition, nuisance parameters
can be estimated simultaneously with the prediction function
Integrated Brier Score based Survival Cobra -- A regression based approach
Recently Goswami et al. \cite{goswami2022concordance} introduced two novel
implementations of combined regression strategy to find the conditional
survival function. The paper uses regression-based weak learners and provides
an alternative version of the combined regression strategy (COBRA) ensemble
using the Integrated Brier Score to predict conditional survival function. We
create a novel predictor based on a weighted version of all machine predictions
taking weights as a specific function of normalized Integrated Brier Score. We
use two different norms (Frobenius and Sup norm) to extract the proximity
points in the algorithm. Our implementations consider right-censored data too.
We illustrate the proposed algorithms through some real-life data analysis.Comment: arXiv admin note: text overlap with arXiv:2209.1191
Explainable Software Defect Prediction from Cross Company Project Metrics Using Machine Learning
Predicting the number of defects in a project is critical for project test managers to allocate budget, resources, and schedule for testing, support and maintenance efforts. Software Defect Prediction models predict the number of defects in given projects after training the model with historical defect related information. The majority of defect prediction studies focused on predicting defect-prone modules from methods, and class-level static information, whereas this study predicts defects from project-level information based on a cross-company project dataset. This study utilizes software sizing metrics, effort metrics, and defect density information, and focuses on developing defect prediction models that apply various machine learning algorithms. One notable issue in existing defect prediction studies is the lack of transparency in the developed models. Consequently, the explain-ability of the developed model has been demonstrated using the state-of-the-art post-hoc model-agnostic method called Shapley Additive exPlanations (SHAP). Finally, important features for predicting defects from cross-company project information were identified