Search CORE

4,456 research outputs found

Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability

Author: AA Freitas
B Bischl
B Ustun
C Molnar
G Casalicchio
H Fanaee-T
H Schielzeth
J Fürnkranz
J Huysmans
J Knowles
JH Friedman
JH Friedman
K Hamidieh
M Philipp
P Cortez
Q Zhou
R Guidotti
Publication venue
Publication date: 23/09/2019
Field of study

Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity

arXiv.org e-Print Archive

Crossref

TreeGrad: Transferring Tree Ensembles to Neural Networks

Author: C Siu
DH Wolpert
F Pedregosa
JA Blackard
JH Friedman
K Nakai
L Breiman
SK Murthy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/12/2019
Field of study

Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

arXiv.org e-Print Archive

Crossref

Position Bias Estimation for Unbiased Learning-to-Rank in eCommerce Search

Author: A Chuklin
G Casella
H Li
JH Friedman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2019
Field of study

The Unbiased Learning-to-Rank framework has been recently proposed as a general approach to systematically remove biases, such as position bias, from learning-to-rank models. The method takes two steps - estimating click propensities and using them to train unbiased models. Most common methods proposed in the literature for estimating propensities involve some degree of intervention in the live search engine. An alternative approach proposed recently uses an Expectation Maximization (EM) algorithm to estimate propensities by using ranking features for estimating relevances. In this work we propose a novel method to directly estimate propensities which does not use any intervention in live search or rely on modeling relevance. Rather, we take advantage of the fact that the same query-document pair may naturally change ranks over time. This typically occurs for eCommerce search because of change of popularity of items over time, existence of time dependent ranking features, or addition or removal of items to the index (an item getting sold or a new item being listed). However, our method is general and can be applied to any search engine for which the rank of the same document may naturally change over time for the same query. We derive a simple likelihood function that depends on propensities only, and by maximizing the likelihood we are able to get estimates of the propensities. We apply this method to eBay search data to estimate click propensities for web and mobile search and compare these with estimates using the EM method. We also use simulated data to show that the method gives reliable estimates of the "true" simulated propensities. Finally, we train an unbiased learning-to-rank model for eBay search using the estimated propensities and show that it outperforms both baselines - one without position bias correction and one with position bias correction using the EM method.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Factorizing LambdaMART for cold start recommendations

Author: Alexandros Kalousis
CJ Burges
CJ Burges
D Cai
J Fürnkranz
JH Friedman
Jun Wang
M Hilario
N Srebro
Phong Nguyen
Publication venue
Publication date: 04/11/2015
Field of study

Recommendation systems often rely on point-wise loss metrics such as the mean squared error. However, in real recommendation settings only few items are presented to a user. This observation has recently encouraged the use of rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to rank which relies on such a metric. Despite its success it does not have a principled regularization mechanism relying in empirical approaches to control model complexity leaving it thus prone to overfitting. Motivated by the fact that very often the users' and items' descriptions as well as the preference behavior can be well summarized by a small number of hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization (LambdaMART-MF), that learns a low rank latent representation of users and items using gradient boosted trees. The algorithm factorizes lambdaMART by defining relevance scores as the inner product of the learned representations of the users and items. The low rank is essentially a model complexity controller; on top of it we propose additional regularizers to constraint the learned latent representations that reflect the user and item manifolds as these are defined by their original feature based descriptors and the preference behavior. Finally we also propose to use a weighted variant of NDCG to reduce the penalty for similar items with large rating discrepancy. We experiment on two very different recommendation datasets, meta-mining and movies-users, and evaluate the performance of LambdaMART-MF, with and without regularization, in the cold start setting as well as in the simpler matrix completion setting. In both cases it outperforms in a significant manner current state of the art algorithms

arXiv.org e-Print Archive

Crossref

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Archive ouverte UNIGE

Energy Consumption Forecasting Using Ensemble Learning Algorithms

Author: F Pedregosa
GJ Osório
JH Friedman
JH Friedman
Jin Gou
MQ Raza
Pei Du
Samir Touzani
Tanveer Ahmad
Tiago Pinto
Xiaobo Zhang
Yoav Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/06/2019
Field of study

DCAI 2019: Distributed Computing and Artificial Intelligence, 16th International Conference, Special SessionsThe increase of renewable energy sources of intermittent nature has brought several new challenges for power and energy systems. In order to deal with the variability from the generation side, there is the need to balance it by managing consumption appropriately. Forecasting energy consumption becomes, therefore, more relevant than ever. This paper presents and compares three different ensemble learning methods, namely random forests, gradient boosted regression trees and Adaboost. Hour-ahead electricity load forecasts are presented for the building N of GECAD at ISEP campus. The performance of the forecasting models is assessed, and results show that the Adaboost model is superior to the other considered models for the one-hour ahead forecasts. The results of this study compared to previous works indicates that ensemble learning methods are a viable choice for short-term load forecast.This work has received funding from National Funds through FCT (Fundaçao da Ciencia e Tecnologia) under the project SPET – 29165, call SAICT 2017.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico do Porto

Crossref

ZENODO

The wavelet-NARMAX representation : a hybrid model structure combining polynomial models with multiresolution wavelet decompositions

Author: Billings SA
Billings SA
Billings SA
Brown M
Campbell C
Chen S
Chen S
Chen S
Chen S
Chen S
Chen ZH
Chui CK
Daubechies I
Friedman JH
Friedman JH
Haykin S
Lee KL
Leontaritis IJ
Ljung L
Pearson RK
Schumaker LL
Wang LX
Wei HL
Wei HL
Zhang Q
Zhang Q
Publication venue: 'Informa UK Limited'
Publication date: 20/02/2005
Field of study

A new hybrid model structure combing polynomial models with multiresolution wavelet decompositions is introduced for nonlinear system identification. Polynomial models play an important role in approximation theory, and have been extensively used in linear and nonlinear system identification. Wavelet decompositions, in which the basis functions have the property of localization in both time and frequency, outperform many other approximation schemes and offer a flexible solution for approximating arbitrary functions. Although wavelet representations can approximate even severe nonlinearities in a given signal very well, the advantage of these representations can be lost when wavelets are used to capture linear or low-order nonlinear behaviour in a signal. In order to sufficiently utilise the global property of polynomials and the local property of wavelet representations simultaneously, in this study polynomial models and wavelet decompositions are combined together in a parallel structure to represent nonlinear input-output systems. As a special form of the NARMAX model, this hybrid model structure will be referred to as the WAvelet-NARMAX model, or simply WANARMAX. Generally, such a WANARMAX representation for an input-output system might involve a large number of basis functions and therefore a great number of model terms. Experience reveals that only a small number of these model terms are significant to the system output. A new fast orthogonal least squares algorithm, called the matching pursuit orthogonal least squares (MPOLS) algorithm, is also introduced in this study to determine which terms should be included in the final model

Crossref

White Rose Research Online

Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations

Author: A Goldstein
A Zien
C Molnar
C Rudin
E Štrumbelj
G Casalicchio
JH Friedman
L Breiman
S Cohen
S Lipovetsky
SM Lundberg
T Bartus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/02/2020
Field of study

Model-agnostic interpretation techniques allow us to explain the behavior of any predictive model. Due to different notations and terminology, it is difficult to see how they are related. A unified view on these methods has been missing. We present the generalized SIPA (sampling, intervention, prediction, aggregation) framework of work stages for model-agnostic interpretations and demonstrate how several prominent methods for feature effects can be embedded into the proposed framework. Furthermore, we extend the framework to feature importance computations by pointing out how variance-based and performance-based importance measures are based on the same work stages. The SIPA framework reduces the diverse set of model-agnostic techniques to a single methodology and establishes a common terminology to discuss them in future work

arXiv.org e-Print Archive

Crossref

Representing complex data using localized principal components with application to astronomical data

Author: A Gersho
A Gorban
AH Monaghan
AR Webb
B Chalmond
B Kégl
C Allende Prieto
CAL Bailer-Jones
CAL Bailer-Jones
DJ Marchette
E Diday
E Oja
EC Malthouse
EM Braverman
FL Hall
H Hotelling
H Späth
H Wold
IT Jolliffe
J Einbeck
J Einbeck
JH Friedman
JH Friedman
JH Friedman
JJ Verbeek
JM Chambers
K Fukunaga
K Hornik
L Breiman
MAC Perryman
MG Kendall
N Kambhatla
P Delicado
P Delicado
PG Willemsen
R Tibshirani
RJ Bolton
S de Jong
T Aluja-Banet
T Duchamps
T Hastie
T Hastie
WS Cleveland
Z-Y Liu
Publication venue
Publication date: 01/01/2007
Field of study

Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear'', ``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or, more general, ``complex''. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds), Lecture Notes in Computational Science and Engineering, Springer, 2007, pp. 180--204, http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-

arXiv.org e-Print Archive

Durham Research Online

Crossref

Enlighten

Explore Bristol Research

Regression with Linear Factored Functions

Author: CM Bishop
I-C Yeh
J Gerritsma
JA Nelder
JH Friedman
L Csató
LP Kaelbling
ME Tipping
P Cortez
P Tüfekci
W Böhmer
W Böhmer
Z Wang
Publication venue
Publication date: 30/03/2015
Field of study

Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.Comment: Under review as conference paper at ECML/PKDD 201

arXiv.org e-Print Archive

Crossref

A Combined Deep Learning-Gradient Boosting Machine Framework for Fluid Intelligence Prediction

Author: A Pfefferbaum
D Shen
H Zou
JH Friedman
JH Friedman
JR Gray
L Wang
M Havaei
M Luciana
MW Cole
Roberto Colom
SM Jaeggi
W Zhu
Y LeCun
Z Akkus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/10/2019
Field of study

The ABCD Neurocognitive Prediction Challenge is a community driven competition asking competitors to develop algorithms to predict fluid intelligence score from T1-w MRIs. In this work, we propose a deep learning combined with gradient boosting machine framework to solve this task. We train a convolutional neural network to compress the high dimensional MRI data and learn meaningful image features by predicting the 123 continuous-valued derived data provided with each MRI. These extracted features are then used to train a gradient boosting machine that predicts the residualized fluid intelligence score. Our approach achieved mean square error (MSE) scores of 18.4374, 68.7868, and 96.1806 for the training, validation, and test set respectively.Comment: Challenge in Adolescent Brain Cognitive Development Neurocognitive Predictio

arXiv.org e-Print Archive

Crossref