4,428 research outputs found

    OBOE: Collaborative Filtering for AutoML Model Selection

    Full text link
    Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This paper introduces OBOE, a collaborative filtering method for time-constrained model selection and hyperparameter tuning. OBOE forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. OBOE can find good models under constraints on the number of models fit or the total time budget. To this end, this paper develops a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems. Moreover, the success of the bilinear model used by OBOE suggests that AutoML may be simpler than was previously understood

    Inferring Networks of Substitutable and Complementary Products

    Full text link
    In a modern recommender system, it is important to understand how products relate to each other. For example, while a user is looking for mobile phones, it might make sense to recommend other phones, but once they buy a phone, we might instead want to recommend batteries, cases, or chargers. These two types of recommendations are referred to as substitutes and complements: substitutes are products that can be purchased instead of each other, while complements are products that can be purchased in addition to each other. Here we develop a method to infer networks of substitutable and complementary products. We formulate this as a supervised link prediction task, where we learn the semantics of substitutes and complements from data associated with products. The primary source of data we use is the text of product reviews, though our method also makes use of features such as ratings, specifications, prices, and brands. Methodologically, we build topic models that are trained to automatically discover topics from text that are successful at predicting and explaining such relationships. Experimentally, we evaluate our system on the Amazon product catalog, a large dataset consisting of 9 million products, 237 million links, and 144 million reviews.Comment: 12 pages, 6 figure

    Collaborative filtering for recommender systems with implicit feedback

    Get PDF
    openRecommending the right products to the customers can significantly increase the sales of an e-commerce, and the presence of huge amounts of transactional data makes data-driven solutions the best choice for the recommender systems in many circumstances. In this work, a general overview of the recommendation task is given, then several data-driven methods are compared on a real world company data. In particular, the effort is centered around implicit feedback, i.e. binary data such as sales, and collaborative filtering, that is the usage of community behavior in the suggestions computation. Finally, different ways to handle cold starts, that are new customers, are discussed and compared.Recommending the right products to the customers can significantly increase the sales of an e-commerce, and the presence of huge amounts of transactional data makes data-driven solutions the best choice for the recommender systems in many circumstances. In this work, a general overview of the recommendation task is given, then several data-driven methods are compared on a real world company data. In particular, the effort is centered around implicit feedback, i.e. binary data such as sales, and collaborative filtering, that is the usage of community behavior in the suggestions computation. Finally, different ways to handle cold starts, that are new customers, are discussed and compared

    A Statistical Comparison of Classification Algorithms on a Single Data Set

    Get PDF
    This research uses four classification algorithms in standard and boosted forms to predict members of a class for an online community. We compare two performance measures, area under the curve (AUC) and accuracy in the standard and boosted forms. The research compares four popular algorithms Bayes, logistic regression, J48 and Nearest Neighbor (NN). The analysis shows that there are significant differences among the base classification algorithms—J48 had the best accuracy. Additionally, the results show that boosted methods improved the accuracy of logistic regression. ANOVA was used to detect the differences between the algorithms; post hoc analysis shows the differences between specific algorithms
    • …
    corecore