31 research outputs found

    A Survey on Opinion Mining Techniques

    Get PDF
    Mining of opinions from customer reviews is received tremendous attention from both domain dependent document and domain independent document as it decides the overall rating of any product. The sale and market of product is totally dependent on these reviews. Opinion identification is not a big problem if we use a single review corpus, but it will give poor results. On using two or more corpus it is more complex. There are number of existing techniques for opinion mining, but are suitable for a single corpus not for multiple corpuses. In this current paper we propose a Novel technique for mining opinion features from two or more review corpus. This technique use two corpus one is domain dependent and other domain independent. We will major domain dependent relevance for candidate feature with both domain dependent and domain independent corpus, we call it as intrinsic domain relevance and extrinsic domain relevance respectively. The opinion features with IDR greater than intrinsic domain relevance threshold and less than extrinsic domain relevance are user opinions plays an important role in finding grade of the product. Many users now a day won’t to now the grade of the product along with which positive and negative factors decide this rating. In proposed paper different techniques are proposed to extract opinion features from two or more review corpora

    Text categorization by fuzzy domain adaptation

    Full text link
    Machine learning methods have attracted attention of researches in computational fields such as classification/categorization. However, these learning methods work under the assumption that the training and test data distributions are identical. In some real world applications, the training data (from the source domain) and test data (from the target domain) come from different domains and this may result in different data distributions. Moreover, the values of the features and/or labels of the data sets could be non-numeric and contain vague values. In this study, we propose a fuzzy domain adaptation method, which offers an effective way to deal with both issues. It utilizes the similarity concept to modify the target instances' labels, which were initially classified by a shift-unaware classifier. The proposed method is built on the given data and refines the labels. In this way it performs completely independently of the shift-unaware classifier. As an example of text categorization, 20Newsgroup data set is used in the experiments to validate the proposed method. The results, which are compared with those generated when using different baselines, demonstrate a significant improvement in the accuracy. © 2013 IEEE

    Cross-Domain Labeled LDA for Cross-Domain Text Classification

    Full text link
    Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models' learning ability and will further impair models' performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model's learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision.Comment: ICDM 201

    On Horizontal and Vertical Separation in Hierarchical Text Classification

    Get PDF
    Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers. Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce a "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Hierarchical Significant Words Language Models (HSWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real-world data and demonstrate that how HSWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure.Comment: Full paper (10 pages) accepted for publication in proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR'16

    Transductive Distributional Correspondence Indexing for Cross-Domain Topic Classification

    Get PDF
    Abstract. Obtaining high-quality annotated data for training a classifier for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a different but related source domain in order to deploy a classification model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classification model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classified; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classification by topic, i.e., the task of assigning class labels to a specific set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classification by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classification by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases
    corecore