4 research outputs found

    Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond

    Full text link
    The vast majority of existing algorithms for unsupervised domain adaptation (UDA) focus on adapting from a labeled source domain to an unlabeled target domain directly in a one-off way. Gradual domain adaptation (GDA), on the other hand, assumes a path of (Tβˆ’1)(T-1) unlabeled intermediate domains bridging the source and target, and aims to provide better generalization in the target domain by leveraging the intermediate ones. Under certain assumptions, Kumar et al. (2020) proposed a simple algorithm, Gradual Self-Training, along with a generalization bound in the order of eO(T)(Ξ΅0+O(log(T)/n))e^{O(T)} \left(\varepsilon_0+O\left(\sqrt{log(T)/n}\right)\right) for the target domain error, where Ξ΅0\varepsilon_0 is the source domain error and nn is the data size of each domain. Due to the exponential factor, this upper bound becomes vacuous when TT is only moderately large. In this work, we analyze gradual self-training under more general and relaxed assumptions, and prove a significantly improved generalization bound as O~(Ξ΅0+TΞ”+T/n+1/nT)\widetilde{O}\left(\varepsilon_0 + T\Delta + T/\sqrt{n} + 1/\sqrt{nT}\right), where Ξ”\Delta is the average distributional distance between consecutive domains. Compared with the existing bound with an exponential dependency on TT as a multiplicative factor, our bound only depends on TT linearly and additively. Perhaps more interestingly, our result implies the existence of an optimal choice of TT that minimizes the generalization error, and it also naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length TΞ”T\Delta between the source and target. To corroborate the implications of our theory, we examine gradual self-training on multiple semi-synthetic and real datasets, which confirms our findings. We believe our insights provide a path forward toward the design of future GDA algorithms.Comment: The code will be released at https://github.com/Haoxiang-Wang/gradual-domain-adaptatio

    Gradual Domain Adaptation: Theory and Algorithms

    Full text link
    Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is large. Gradual domain adaptation (GDA) mitigates this limitation by using intermediate domains to gradually adapt from the source to the target domain. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound compared with Kumar et al. (2020). Our theoretical analysis leads to an interesting insight: to minimize the generalization error on the target domain, the sequence of intermediate domains should be placed uniformly along the Wasserstein geodesic between the source and target domains. The insight is particularly useful under the situation where intermediate domains are missing or scarce, which is often the case in real-world applications. Based on the insight, we propose G\textbf{G}enerative Gradual DO\textbf{O}main A\textbf{A}daptation with Optimal T\textbf{T}ransport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/yifei-he/GOAT.Comment: arXiv admin note: substantial text overlap with arXiv:2204.0820

    Coordinate Descent Algorithm for Ramp Loss Linear Programming Support Vector Machines

    No full text
    Β© 2015, Springer Science+Business Media New York. In order to control the effects of outliers in training data and get sparse results, Huang et al. (J Mach Learn Res 15:2185–2211, 2014) proposed the ramp loss linear programming support vector machine. This combination of l 1 regularization and ramp loss does not only lead to the sparsity of parameters in decision functions, but also limits the effects of outliers with a maximal penalty. However, due to its non-convexity, the computational cost to achieve a satisfying solution is often expensive. In this paper, we propose a modified coordinate descent algorithm, which deals with a series of one-variable piecewise linear subproblems. Considering that the obtained subproblems are DC programming problems, we linearize the concave part of the objective functions and solve the obtained convex problems. To test the performances of the proposed algorithm, numerical experiments have been carried out and analysed on benchmark data sets. To enhance the sparsity and robustness, the experiments are initialized from C-SVM solutions. The results confirm its excellent performances in classification accuracy, robustness and efficiency in computation.status: publishe
    corecore