4,428 research outputs found
OBOE: Collaborative Filtering for AutoML Model Selection
Algorithm selection and hyperparameter tuning remain two of the most
challenging tasks in machine learning. Automated machine learning (AutoML)
seeks to automate these tasks to enable widespread use of machine learning by
non-experts. This paper introduces OBOE, a collaborative filtering method for
time-constrained model selection and hyperparameter tuning. OBOE forms a matrix
of the cross-validated errors of a large number of supervised learning models
(algorithms together with hyperparameters) on a large number of datasets, and
fits a low rank model to learn the low-dimensional feature vectors for the
models and datasets that best predict the cross-validated errors. To find
promising models for a new dataset, OBOE runs a set of fast but informative
algorithms on the new dataset and uses their cross-validated errors to infer
the feature vector for the new dataset. OBOE can find good models under
constraints on the number of models fit or the total time budget. To this end,
this paper develops a new heuristic for active learning in time-constrained
matrix completion based on optimal experiment design. Our experiments
demonstrate that OBOE delivers state-of-the-art performance faster than
competing approaches on a test bed of supervised learning problems. Moreover,
the success of the bilinear model used by OBOE suggests that AutoML may be
simpler than was previously understood
Inferring Networks of Substitutable and Complementary Products
In a modern recommender system, it is important to understand how products
relate to each other. For example, while a user is looking for mobile phones,
it might make sense to recommend other phones, but once they buy a phone, we
might instead want to recommend batteries, cases, or chargers. These two types
of recommendations are referred to as substitutes and complements: substitutes
are products that can be purchased instead of each other, while complements are
products that can be purchased in addition to each other.
Here we develop a method to infer networks of substitutable and complementary
products. We formulate this as a supervised link prediction task, where we
learn the semantics of substitutes and complements from data associated with
products. The primary source of data we use is the text of product reviews,
though our method also makes use of features such as ratings, specifications,
prices, and brands. Methodologically, we build topic models that are trained to
automatically discover topics from text that are successful at predicting and
explaining such relationships. Experimentally, we evaluate our system on the
Amazon product catalog, a large dataset consisting of 9 million products, 237
million links, and 144 million reviews.Comment: 12 pages, 6 figure
Collaborative filtering for recommender systems with implicit feedback
openRecommending the right products to the customers can significantly increase the
sales of an e-commerce, and the presence of huge amounts of transactional data
makes data-driven solutions the best choice for the recommender systems in many
circumstances. In this work, a general overview of the recommendation task is
given, then several data-driven methods are compared on a real world company
data. In particular, the effort is centered around implicit feedback, i.e. binary data
such as sales, and collaborative filtering, that is the usage of community behavior
in the suggestions computation. Finally, different ways to handle cold starts, that
are new customers, are discussed and compared.Recommending the right products to the customers can significantly increase the
sales of an e-commerce, and the presence of huge amounts of transactional data
makes data-driven solutions the best choice for the recommender systems in many
circumstances. In this work, a general overview of the recommendation task is
given, then several data-driven methods are compared on a real world company
data. In particular, the effort is centered around implicit feedback, i.e. binary data
such as sales, and collaborative filtering, that is the usage of community behavior
in the suggestions computation. Finally, different ways to handle cold starts, that
are new customers, are discussed and compared
A Statistical Comparison of Classification Algorithms on a Single Data Set
This research uses four classification algorithms in standard and boosted forms to predict members of a class for an online community. We compare two performance measures, area under the curve (AUC) and accuracy in the standard and boosted forms. The research compares four popular algorithms Bayes, logistic regression, J48 and Nearest Neighbor (NN). The analysis shows that there are significant differences among the base classification algorithms—J48 had the best accuracy. Additionally, the results show that boosted methods improved the accuracy of logistic regression. ANOVA was used to detect the differences between the algorithms; post hoc analysis shows the differences between specific algorithms
- …