Search CORE

31 research outputs found

A Survey on Opinion Mining Techniques

Author: Mr. A. V. Moholkar, Prof. S. S. Bere, Mr. S. P. Ghode, Prof. B. S. Salve
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/10/2014
Field of study

Mining of opinions from customer reviews is received tremendous attention from both domain dependent document and domain independent document as it decides the overall rating of any product. The sale and market of product is totally dependent on these reviews. Opinion identification is not a big problem if we use a single review corpus, but it will give poor results. On using two or more corpus it is more complex. There are number of existing techniques for opinion mining, but are suitable for a single corpus not for multiple corpuses. In this current paper we propose a Novel technique for mining opinion features from two or more review corpus. This technique use two corpus one is domain dependent and other domain independent. We will major domain dependent relevance for candidate feature with both domain dependent and domain independent corpus, we call it as intrinsic domain relevance and extrinsic domain relevance respectively. The opinion features with IDR greater than intrinsic domain relevance threshold and less than extrinsic domain relevance are user opinions plays an important role in finding grade of the product. Many users now a day won’t to now the grade of the product along with which positive and negative factors decide this rating. In proposed paper different techniques are proposed to extract opinion features from two or more review corpora

International Journal on Recent and Innovation Trends in Computing and Communication

Text categorization by fuzzy domain adaptation

Author: Behbood V
Lu J
Zhang G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/11/2013
Field of study

Machine learning methods have attracted attention of researches in computational fields such as classification/categorization. However, these learning methods work under the assumption that the training and test data distributions are identical. In some real world applications, the training data (from the source domain) and test data (from the target domain) come from different domains and this may result in different data distributions. Moreover, the values of the features and/or labels of the data sets could be non-numeric and contain vague values. In this study, we propose a fuzzy domain adaptation method, which offers an effective way to deal with both issues. It utilizes the similarity concept to modify the target instances' labels, which were initially classified by a shift-unaware classifier. The proposed method is built on the given data and refines the labels. In this way it performs completely independently of the shift-unaware classifier. As an example of text categorization, 20Newsgroup data set is used in the experiments to validate the proposed method. The results, which are compared with those generated when using different baselines, demonstrate a significant improvement in the accuracy. © 2013 IEEE

OPUS - University of Technology Sydney

Cross-Domain Labeled LDA for Cross-Domain Text Classification

Author: Jing Baoyu
Lu Chenwei
Niu Cheng
Wang Deqing
Zhuang Fuzhen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/09/2018
Field of study

Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models' learning ability and will further impair models' performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model's learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision.Comment: ICDM 201

arXiv.org e-Print Archive

Crossref

On Horizontal and Vertical Separation in Hierarchical Text Classification

Author: Chen M.
Feature
Kim D.-k.
Lewis D. D.
McCallum A.
Sigurbjörnsson B.
Song Y.
Sun A.
Zhou D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers. Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce a "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Hierarchical Significant Words Language Models (HSWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real-world data and demonstrate that how HSWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure.Comment: Full paper (10 pages) accepted for publication in proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR'16

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Transductive Distributional Correspondence Indexing for Cross-Domain Topic Classification

Author: Alejandro Moreo Fernández
Andrea Esuli
Fabrizio Sebastiani
Publication venue
Publication date: 05/03/2020
Field of study

Abstract. Obtaining high-quality annotated data for training a classifier for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a different but related source domain in order to deploy a classification model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classification model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classified; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classification by topic, i.e., the task of assigning class labels to a specific set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classification by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classification by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases

CiteSeerX