9,678 research outputs found
Democracy is good for ranking: Towards multi-view rank learning and adaptation in web search
Web search ranking models are learned from features origi-nated from different views or perspectives of document rel-evancy, such as query dependent or independent features. This seems intuitively conformant to the principle of multi-view approach that leverages distinct complementary views to improve model learning. In this paper, we aim to obtain optimal separation of ranking features into non-overlapping subsets (i.e., views), and use such different views for rank learning and adaptation. We present a novel semi-supervised multi-view ranking model, which is then extended into an adaptive ranker for search domains where no training data exists. The core idea is to proactively strengthen view con-sistency (i.e., the consistency between different rankings each predicted by a distinct view-based ranker) especially when training and test data follow divergent distributions. For this purpose, we propose a unified framework based on list-wise ranking scheme to mutually reinforce the view con-sistency of target queries and the appropriate weighting of source queries that act as prior knowledge. Based on LETOR and Yahoo Learning to Rank datasets, our method signif-icantly outperforms some strong baselines including single-view ranking models commonly used and multi-view ranking models that do not impose view consistency on target data
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
- …