4,456 research outputs found
Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability
Post-hoc model-agnostic interpretation methods such as partial dependence
plots can be employed to interpret complex machine learning models. While these
interpretation methods can be applied regardless of model complexity, they can
produce misleading and verbose results if the model is too complex, especially
w.r.t. feature interactions. To quantify the complexity of arbitrary machine
learning models, we propose model-agnostic complexity measures based on
functional decomposition: number of features used, interaction strength and
main effect complexity. We show that post-hoc interpretation of models that
minimize the three measures is more reliable and compact. Furthermore, we
demonstrate the application of these measures in a multi-objective optimization
approach which simultaneously minimizes loss and complexity
TreeGrad: Transferring Tree Ensembles to Neural Networks
Gradient Boosting Decision Tree (GBDT) are popular machine learning
algorithms with implementations such as LightGBM and in popular machine
learning toolkits like Scikit-Learn. Many implementations can only produce
trees in an offline manner and in a greedy manner. We explore ways to convert
existing GBDT implementations to known neural network architectures with
minimal performance loss in order to allow decision splits to be updated in an
online manner and provide extensions to allow splits points to be altered as a
neural architecture search problem. We provide learning bounds for our neural
network.Comment: Technical Report on Implementation of Deep Neural Decision Forests
Algorithm. To accompany implementation here:
https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019).
"Transferring Tree Ensembles to Neural Networks". International Conference on
Neural Information Processing. Springer, 2019. arXiv admin note: text overlap
with arXiv:1909.1179
Position Bias Estimation for Unbiased Learning-to-Rank in eCommerce Search
The Unbiased Learning-to-Rank framework has been recently proposed as a
general approach to systematically remove biases, such as position bias, from
learning-to-rank models. The method takes two steps - estimating click
propensities and using them to train unbiased models. Most common methods
proposed in the literature for estimating propensities involve some degree of
intervention in the live search engine. An alternative approach proposed
recently uses an Expectation Maximization (EM) algorithm to estimate
propensities by using ranking features for estimating relevances. In this work
we propose a novel method to directly estimate propensities which does not use
any intervention in live search or rely on modeling relevance. Rather, we take
advantage of the fact that the same query-document pair may naturally change
ranks over time. This typically occurs for eCommerce search because of change
of popularity of items over time, existence of time dependent ranking features,
or addition or removal of items to the index (an item getting sold or a new
item being listed). However, our method is general and can be applied to any
search engine for which the rank of the same document may naturally change over
time for the same query. We derive a simple likelihood function that depends on
propensities only, and by maximizing the likelihood we are able to get
estimates of the propensities. We apply this method to eBay search data to
estimate click propensities for web and mobile search and compare these with
estimates using the EM method. We also use simulated data to show that the
method gives reliable estimates of the "true" simulated propensities. Finally,
we train an unbiased learning-to-rank model for eBay search using the estimated
propensities and show that it outperforms both baselines - one without position
bias correction and one with position bias correction using the EM method.Comment: 10 pages, 3 figure
Factorizing LambdaMART for cold start recommendations
Recommendation systems often rely on point-wise loss metrics such as the mean
squared error. However, in real recommendation settings only few items are
presented to a user. This observation has recently encouraged the use of
rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to
rank which relies on such a metric. Despite its success it does not have a
principled regularization mechanism relying in empirical approaches to control
model complexity leaving it thus prone to overfitting.
Motivated by the fact that very often the users' and items' descriptions as
well as the preference behavior can be well summarized by a small number of
hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization
(LambdaMART-MF), that learns a low rank latent representation of users and
items using gradient boosted trees. The algorithm factorizes lambdaMART by
defining relevance scores as the inner product of the learned representations
of the users and items. The low rank is essentially a model complexity
controller; on top of it we propose additional regularizers to constraint the
learned latent representations that reflect the user and item manifolds as
these are defined by their original feature based descriptors and the
preference behavior. Finally we also propose to use a weighted variant of NDCG
to reduce the penalty for similar items with large rating discrepancy.
We experiment on two very different recommendation datasets, meta-mining and
movies-users, and evaluate the performance of LambdaMART-MF, with and without
regularization, in the cold start setting as well as in the simpler matrix
completion setting. In both cases it outperforms in a significant manner
current state of the art algorithms
Energy Consumption Forecasting Using Ensemble Learning Algorithms
DCAI 2019: Distributed Computing and Artificial Intelligence, 16th International Conference, Special SessionsThe increase of renewable energy sources of intermittent nature has brought several new challenges for power and energy systems. In order to deal with the variability from the generation side, there is the need to balance it by managing consumption appropriately. Forecasting energy consumption becomes, therefore, more relevant than ever. This paper presents and compares three different ensemble learning methods, namely random forests, gradient boosted regression trees and Adaboost. Hour-ahead electricity load forecasts are presented for the building N of GECAD at ISEP campus. The performance of the forecasting models is assessed, and results show that the Adaboost model is superior to the other considered models for the one-hour ahead forecasts. The results of this study compared to previous works indicates that ensemble learning methods are a viable choice for short-term load forecast.This work has received funding from National Funds through FCT (Fundaçao da Ciencia e Tecnologia) under the project SPET – 29165, call SAICT 2017.info:eu-repo/semantics/publishedVersio
The wavelet-NARMAX representation : a hybrid model structure combining polynomial models with multiresolution wavelet decompositions
A new hybrid model structure combing polynomial models with multiresolution wavelet decompositions is introduced for nonlinear system identification. Polynomial models play an important role in approximation theory, and have been extensively used in linear and nonlinear system identification. Wavelet decompositions, in which the basis functions have the property of localization in both time and frequency, outperform many other approximation schemes and offer a flexible solution for approximating arbitrary functions. Although wavelet representations can approximate even severe nonlinearities in a given signal very well, the advantage of these representations can be lost when wavelets are used to capture linear or low-order nonlinear behaviour in a signal. In order to sufficiently utilise the global property of polynomials and the local property of wavelet representations simultaneously, in this study polynomial models and wavelet decompositions are combined together in a parallel structure to represent nonlinear input-output systems. As a special form of the NARMAX model, this hybrid model structure will be referred to as the WAvelet-NARMAX model, or simply WANARMAX. Generally, such a WANARMAX representation for an input-output system might involve a large number of basis functions and therefore a great number of model terms. Experience reveals that only a small number of these model terms are significant to the system output. A new fast orthogonal least squares algorithm, called the matching pursuit orthogonal least squares (MPOLS) algorithm, is also introduced in this study to determine which terms should be included in the final model
Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations
Model-agnostic interpretation techniques allow us to explain the behavior of
any predictive model. Due to different notations and terminology, it is
difficult to see how they are related. A unified view on these methods has been
missing. We present the generalized SIPA (sampling, intervention, prediction,
aggregation) framework of work stages for model-agnostic interpretations and
demonstrate how several prominent methods for feature effects can be embedded
into the proposed framework. Furthermore, we extend the framework to feature
importance computations by pointing out how variance-based and
performance-based importance measures are based on the same work stages. The
SIPA framework reduces the diverse set of model-agnostic techniques to a single
methodology and establishes a common terminology to discuss them in future
work
Representing complex data using localized principal components with application to astronomical data
Often the relation between the variables constituting a multivariate data
space might be characterized by one or more of the terms: ``nonlinear'',
``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or,
more general, ``complex''. In these cases, simple principal component analysis
(PCA) as a tool for dimension reduction can fail badly. Of the many alternative
approaches proposed so far, local approximations of PCA are among the most
promising. This paper will give a short review of localized versions of PCA,
focusing on local principal curves and local partitioning algorithms.
Furthermore we discuss projections other than the local principal components.
When performing local dimension reduction for regression or classification
problems it is important to focus not only on the manifold structure of the
covariates, but also on the response variable(s). Local principal components
only achieve the former, whereas localized regression approaches concentrate on
the latter. Local projection directions derived from the partial least squares
(PLS) algorithm offer an interesting trade-off between these two objectives. We
apply these methods to several real data sets. In particular, we consider
simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and
Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds),
Lecture Notes in Computational Science and Engineering, Springer, 2007, pp.
180--204,
http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-
Regression with Linear Factored Functions
Many applications that use empirically estimated functions face a curse of
dimensionality, because the integrals over most function classes must be
approximated by sampling. This paper introduces a novel regression-algorithm
that learns linear factored functions (LFF). This class of functions has
structural properties that allow to analytically solve certain integrals and to
calculate point-wise products. Applications like belief propagation and
reinforcement learning can exploit these properties to break the curse and
speed up computation. We derive a regularized greedy optimization scheme, that
learns factored basis functions during training. The novel regression algorithm
performs competitively to Gaussian processes on benchmark tasks, and the
learned LFF functions are with 4-9 factored basis functions on average very
compact.Comment: Under review as conference paper at ECML/PKDD 201
A Combined Deep Learning-Gradient Boosting Machine Framework for Fluid Intelligence Prediction
The ABCD Neurocognitive Prediction Challenge is a community driven
competition asking competitors to develop algorithms to predict fluid
intelligence score from T1-w MRIs. In this work, we propose a deep learning
combined with gradient boosting machine framework to solve this task. We train
a convolutional neural network to compress the high dimensional MRI data and
learn meaningful image features by predicting the 123 continuous-valued derived
data provided with each MRI. These extracted features are then used to train a
gradient boosting machine that predicts the residualized fluid intelligence
score. Our approach achieved mean square error (MSE) scores of 18.4374,
68.7868, and 96.1806 for the training, validation, and test set respectively.Comment: Challenge in Adolescent Brain Cognitive Development Neurocognitive
Predictio
- …