4,394 research outputs found
Entity Embeddings of Categorical Variables
We map categorical variables in a function approximation problem into
Euclidean spaces, which are the entity embeddings of the categorical variables.
The mapping is learned by a neural network during the standard supervised
training process. Entity embedding not only reduces memory usage and speeds up
neural networks compared with one-hot encoding, but more importantly by mapping
similar values close to each other in the embedding space it reveals the
intrinsic properties of the categorical variables. We applied it successfully
in a recent Kaggle competition and were able to reach the third position with
relative simple features. We further demonstrate in this paper that entity
embedding helps the neural network to generalize better when the data is sparse
and statistics is unknown. Thus it is especially useful for datasets with lots
of high cardinality features, where other methods tend to overfit. We also
demonstrate that the embeddings obtained from the trained neural network boost
the performance of all tested machine learning methods considerably when used
as the input features instead. As entity embedding defines a distance measure
for categorical variables it can be used for visualizing categorical data and
for data clustering
Gradient tree boosting with random output projections for multi-label classification and multi-output regression
In many applications of supervised learning, multiple classification or
regression outputs have to be predicted jointly. We consider several extensions
of gradient boosting to address such problems. We first propose a
straightforward adaptation of gradient boosting exploiting multiple output
regression trees as base learners. We then argue that this method is only
expected to be optimal when the outputs are fully correlated, as it forces the
partitioning induced by the tree base learners to be shared by all outputs. We
then propose a novel extension of gradient tree boosting to specifically
address this issue. At each iteration of this new method, a regression tree
structure is grown to fit a single random projection of the current residuals
and the predictions of this tree are fitted linearly to the current residuals
of all the outputs, independently. Because of this linear fit, the method can
adapt automatically to any output correlation structure. Extensive experiments
are conducted with this method, as well as other algorithmic variants, on
several artificial and real problems. Randomly projecting the output space is
shown to provide a better adaptation to different output correlation patterns
and is therefore competitive with the best of the other methods in most
settings. Thanks to model sharing, the convergence speed is also improved,
reducing the computing times (or the complexity of the model) to reach a
specific accuracy
Predicting online user behaviour using deep learning algorithms
We propose a robust classifier to predict buying intentions based on user
behaviour within a large e-commerce website. In this work we compare
traditional machine learning techniques with the most advanced deep learning
approaches. We show that both Deep Belief Networks and Stacked Denoising
auto-Encoders achieved a substantial improvement by extracting features from
high dimensional data during the pre-train phase. They prove also to be more
convenient to deal with severe class imbalance.Comment: 21 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1412.6601, arXiv:1406.1231, arXiv:1508.03856 by other author
Sparse Projection Oblique Randomer Forests
Decision forests, including Random Forests and Gradient Boosting Trees, have
recently demonstrated state-of-the-art performance in a variety of machine
learning settings. Decision forests are typically ensembles of axis-aligned
decision trees; that is, trees that split only along feature dimensions. In
contrast, many recent extensions to decision forests are based on axis-oblique
splits. Unfortunately, these extensions forfeit one or more of the favorable
properties of decision forests based on axis-aligned splits, such as robustness
to many noise dimensions, interpretability, or computational efficiency. We
introduce yet another decision forest, called "Sparse Projection Oblique
Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e.,
linear combinations of a small subset of features. SPORF significantly improves
accuracy over existing state-of-the-art algorithms on a standard benchmark
suite for classification with >100 problems of varying dimension, sample size,
and number of classes. To illustrate how SPORF addresses the limitations of
both axis-aligned and existing oblique decision forest methods, we conduct
extensive simulated experiments. SPORF typically yields improved performance
over existing decision forests, while mitigating computational efficiency and
scalability and maintaining interpretability. SPORF can easily be incorporated
into other ensemble methods such as boosting to obtain potentially similar
gains.Comment: 31 pages; submitted to Journal of Machine Learning Research for
revie
ENIGMA-NG: Efficient Neural and Gradient-Boosted Inference Guidance for E
We describe an efficient implementation of clause guidance in
saturation-based automated theorem provers extending the ENIGMA approach.
Unlike in the first ENIGMA implementation where fast linear classifier is
trained and used together with manually engineered features, we have started to
experiment with more sophisticated state-of-the-art machine learning methods
such as gradient boosted trees and recursive neural networks. In particular the
latter approach poses challenges in terms of efficiency of clause evaluation,
however, we show that deep integration of the neural evaluation with the ATP
data-structures can largely amortize this cost and lead to competitive
real-time results. Both methods are evaluated on a large dataset of theorem
proving problems and compared with the previous approaches. The resulting
methods improve on the manually designed clause guidance, providing the first
practically convincing application of gradient-boosted and neural clause
guidance in saturation-style automated theorem provers
Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression
For various applications, the relations between the dependent and independent
variables are highly nonlinear. Consequently, for large scale complex problems,
neural networks and regression trees are commonly preferred over linear models
such as Lasso. This work proposes learning the feature nonlinearities by
binning feature values and finding the best fit in each quantile using
non-convex regularized linear regression. The algorithm first captures the
dependence between neighboring quantiles by enforcing smoothness via
piecewise-constant/linear approximation and then selects a sparse subset of
good features. We prove that the proposed algorithm is statistically and
computationally efficient. In particular, it achieves linear rate of
convergence while requiring near-minimal number of samples. Evaluations on
synthetic and real datasets demonstrate that algorithm is competitive with
current state-of-the-art and accurately learns feature nonlinearities. Finally,
we explore an interesting connection between the binning stage of our algorithm
and sparse Johnson-Lindenstrauss matrices.Comment: 22 pages, 7 figure
Towards Deep and Representation Learning for Talent Search at LinkedIn
Talent search and recommendation systems at LinkedIn strive to match the
potential candidates to the hiring needs of a recruiter or a hiring manager
expressed in terms of a search query or a job posting. Recent work in this
domain has mainly focused on linear models, which do not take complex
relationships between features into account, as well as ensemble tree models,
which introduce non-linearity but are still insufficient for exploring all the
potential feature interactions, and strictly separate feature generation from
modeling. In this paper, we present the results of our application of deep and
representation learning models on LinkedIn Recruiter. Our key contributions
include: (i) Learning semantic representations of sparse entities within the
talent search domain, such as recruiter ids, candidate ids, and skill entity
ids, for which we utilize neural network models that take advantage of LinkedIn
Economic Graph, and (ii) Deep models for learning recruiter engagement and
candidate response in talent search applications. We also explore learning to
rank approaches applied to deep models, and show the benefits for the talent
search use case. Finally, we present offline and online evaluation results for
LinkedIn talent search and recommendation systems, and discuss potential
challenges along the path to a fully deep model architecture. The challenges
and approaches discussed generalize to any multi-faceted search engine.Comment: This paper has been accepted for publication in ACM CIKM 201
Probabilistic Matrix Factorization for Automated Machine Learning
In order to achieve state-of-the-art performance, modern machine learning
techniques require careful data pre-processing and hyperparameter tuning.
Moreover, given the ever increasing number of machine learning models being
developed, model selection is becoming increasingly important. Automating the
selection and tuning of machine learning pipelines consisting of data
pre-processing methods and machine learning models, has long been one of the
goals of the machine learning community. In this paper, we tackle this
meta-learning task by combining ideas from collaborative filtering and Bayesian
optimization. Using probabilistic matrix factorization techniques and
acquisition functions from Bayesian optimization, we exploit experiments
performed in hundreds of different datasets to guide the exploration of the
space of possible pipelines. In our experiments, we show that our approach
quickly identifies high-performing pipelines across a wide range of datasets,
significantly outperforming the current state-of-the-art
Learning Nonlinear Functions Using Regularized Greedy Forest
We consider the problem of learning a forest of nonlinear decision rules with
general loss functions. The standard methods employ boosted decision trees such
as Adaboost for exponential loss and Friedman's gradient boosting for general
loss. In contrast to these traditional boosting algorithms that treat a tree
learner as a black box, the method we propose directly learns decision forests
via fully-corrective regularized greedy search using the underlying forest
structure. Our method achieves higher accuracy and smaller models than gradient
boosting (and Adaboost with exponential loss) on many datasets
Making Tree Ensembles Interpretable
Tree ensembles, such as random forest and boosted trees, are renowned for
their high prediction performance, whereas their interpretability is critically
limited. In this paper, we propose a post processing method that improves the
model interpretability of tree ensembles. After learning a complex tree
ensembles in a standard way, we approximate it by a simpler model that is
interpretable for human. To obtain the simpler model, we derive the EM
algorithm minimizing the KL divergence from the complex ensemble. A synthetic
experiment showed that a complicated tree ensemble was approximated reasonably
as interpretable.Comment: presented at 2016 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2016), New York, N
- …