3,077 research outputs found

    A review of domain adaptation without target labels

    Full text link
    Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

    Robust and Fair Machine Learning under Distribution Shift

    Get PDF
    Machine learning algorithms have been widely used in real world applications. The development of these techniques has brought huge benefits for many AI-related tasks, such as natural language processing, image classification, video analysis, and so forth. In traditional machine learning algorithms, we usually assume that the training data and test data are independently and identically distributed (iid), indicating that the model learned from the training data can be well applied to the test data with good prediction performance. However, this assumption is quite restrictive because the distribution shift can exist from the training data to the test data in many scenarios. In addition, the goal of traditional machine learning model is to maximize the prediction performance, e.g., accuracy, based on the historical training data, which may tend to make unfair predictions for some particular individual or groups. In the literature, researchers either focus on building robust machine learning models under data distribution shift or achieving fairness separately, without considering to solve them simultaneously. The goal of this dissertation is to solve the above challenging issues in fair machine learning under distribution shift. We start from building an agnostic fair framework in federated learning as the data distribution is more diversified and distribution shift exists from the training data to the test data. Then we build a robust framework to address the sample selection bias for fair classification. Next we solve the sample selection bias issue for fair regression. Finally, we propose an adversarial framework to build a personalized model in the distributed setting where the distribution shift exists between different users. In this dissertation, we conduct the following research for fair machine learning under distribution shift. • We develop a fairness-aware agnostic federated learning framework (AgnosticFair) to deal with the challenge of unknown testing distribution; • We propose a framework for robust and fair learning under sample selection bias; • We develop a framework for fair regression under sample selection bias when dependent variable values of a set of samples from the training data are missing as a result of another hidden process; • We propose a learning framework that allows an individual user to build a personalized model in a distributed setting, where the distribution shift exists among different users

    Target Contrastive Pessimistic Discriminant Analysis

    Full text link
    Domain-adaptive classifiers learn from a source domain and aim to generalize to a target domain. If the classifier's assumptions on the relationship between domains (e.g. covariate shift) are valid, then it will usually outperform a non-adaptive source classifier. Unfortunately, it can perform substantially worse when its assumptions are invalid. Validating these assumptions requires labeled target samples, which are usually not available. We argue that, in order to make domain-adaptive classifiers more practical, it is necessary to focus on robust methods; robust in the sense that the model still achieves a particular level of performance without making strong assumptions on the relationship between domains. With this objective in mind, we formulate a conservative parameter estimator that only deviates from the source classifier when a lower or equal risk is guaranteed for all possible labellings of the given target samples. We derive the corresponding estimator for a discriminant analysis model, and show that its risk is actually strictly smaller than that of the source classifier. Experiments indicate that our classifier outperforms state-of-the-art classifiers for geographically biased samples.Comment: 9 pages, no figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:1706.0808

    Robust Fairness under Covariate Shift

    Get PDF
    Making predictions that are fair with regard to protected group membership (race, gender, age, etc.) has become an important requirement for classification algorithms. Existing techniques derive a fair model from sampled labeled data relying on the assumption that training and testing data are identically and independently drawn (iid) from the same distribution. In practice, distribution shift can and does occur between training and testing datasets as the characteristics of individuals interacting with the machine learning system change. We investigate fairness under covariate shift, a relaxation of the iid assumption in which the inputs or covariates change while the conditional label distribution remains the same. We seek fair decisions under these assumptions on target data with unknown labels. We propose an approach that obtains the predictor that is robust to the worst-case in terms of target performance while satisfying target fairness requirements and matching statistical properties of the source data. We demonstrate the benefits of our approach on benchmark prediction tasks

    Robust Regression for Safe Exploration in Control

    Get PDF
    We study the problem of safe learning and exploration in sequential control problems. The goal is to safely collect data samples from an operating environment to learn an optimal controller. A central challenge in this setting is how to quantify uncertainty in order to choose provably-safe actions that allow us to collect useful data and reduce uncertainty, thereby achieving both improved safety and optimality. To address this challenge, we present a deep robust regression model that is trained to directly predict the uncertainty bounds for safe exploration. We then show how to integrate our robust regression approach with model-based control methods by learning a dynamic model with robustness bounds. We derive generalization bounds under domain shifts for learning and connect them with safety and stability bounds in control. We demonstrate empirically that our robust regression approach can outperform conventional Gaussian process (GP) based safe exploration in settings where it is difficult to specify a good GP prior
    • …
    corecore