Search CORE

4 research outputs found

Generic ontology learners on application domains

Author: Fallucchi F
Pazienza MT
Zanzotto FM
Publication venue
Publication date: 01/01/2010
Field of study

Model Adaptation via Model Interpolation and Boosting for Web Search Ranking

Author: Chris Burges
Hongyan Zhou
Jianfeng Gao
Krysta Svore
Nazan Khan
Qiang Wu
Shalin Shah
Yi Su
Publication venue
Publication date: 01/01/2009
Field of study

This paper explores two classes of model adaptation methods for Web search ranking: Model Interpolation and error-driven learning approaches based on a boosting algorithm. The results show that model interpolation, though simple, achieves the best results on all the open test sets where the test data is very different from the training data. The tree-based boosting algorithm achieves the best performance on most of the closed test sets where the test data and the training data are similar, but its performance drops significantly on the open test sets due to the instability of trees. Several methods are explored to improve the robustness of the algorithm, with limited success.

CiteSeerX

Crossref

Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

Author: Tyree Stephen
Publication venue: Washington University Open Scholarship
Publication date: 15/12/2014
Field of study

Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps. For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries. We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget

Washington University St. Louis: Open Scholarship