2 research outputs found

    Transductive Learning with String Kernels for Cross-Domain Text Classification

    Full text link
    For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.Comment: Accepted at ICONIP 2018. arXiv admin note: substantial text overlap with arXiv:1808.0840

    Cross-Domain Sentiment Classification via Topic-Related TrAdaBoost

    No full text
    Cross-domain sentiment classification aims to tag sentiments for a target domain by labeled data from a source domain. Due to the difference between domains, the accuracy of a trained classifier may be very low. In this paper, we propose a boosting-based learning framework named TR-TrAdaBoost for cross-domain sentiment classification. We firstly explore the topic distribution of documents, and then combine it with the unigram TrAdaBoost. The topic distribution captures the domain information of documents, which is valuable for cross-domain sentiment classification. Experimental results indicate that TR-TrAdaBoost represents documents well and boost the performance and robustness of TrAdaBoost
    corecore