Skip to main content
Article thumbnail
Location of Repository

Extracting discriminative concepts for domain adaptation in text mining

By Bo Chen, Wai Lam, Ivor Tsang and Tak-lam Wong

Abstract

One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.188.7296
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://c2inet.sce.ntu.edu.sg/i... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.