690 research outputs found

    A review of domain adaptation without target labels

    Full text link
    Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

    Minimax Classifier with Box Constraint on the Priors

    Get PDF
    Learning a classifier in safety-critical applications like medicine raises several issues. Firstly, the class proportions, also called priors, are in general imbalanced or uncertain. Sometimes, experts are able to provide some bounds on the priors and taking into account this knowledge can improve the predictions. Secondly, it is also necessary to consider any arbitrary loss function given by experts to evaluate the classification decision. Finally, the dataset may contain both categorical and numeric features. In this paper, we propose a box-constrained minimax classifier which addresses all the mentioned issues. To deal with both categorical and numeric features, many works have shown that discretizing the numeric attributes can lead to interesting results. Here, we thus consider that numeric features are discretized. In order to address the class proportions issues, we compute the priors which maximize the empirical Bayes risk over a box-constrained probabilistic simplex. This constraint is defined as the intersection between the simplex and a box constraint provided by experts, which aims at bounding independently each class proportions. Our approach allows to find a compromise between the empirical Bayes classifier and the standard minimax classifier, which may appear too pessimistic. The standard minimax classifier, which has not been studied yet when considerring discrete features, is still accessible by our approach. When considering only discrete features, we show that, for any arbitrary loss function, the empirical Bayes risk, considered as a function of the priors, is a concave non-differentiable multivariate piecewise affine function. To compute the box-constrained least favorable priors, we derive a projected subgradient algorithm. The convergence of our algorithm is established. The performance of our algorithm is illustrated with experiments on the Framingham study database to predict the risk of Coronary Heart Disease (CHD)
    • …
    corecore