5 research outputs found
Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach
The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models
Predicate based association rules mining with new interestingness measure
Association Rule Mining (ARM) is one of the fundamental components in the field of data mining that discovers frequent itemsets and interesting relationships for predicting the associative and correlative behaviours for new data. However, traditional ARM techniques are based on support-confidence that discovers interesting association rules (ARs) using predefined minimum support (minsupp) and minimum confidence (minconf) threshold. In addition, traditional AR techniques only consider frequent items while ignoring rare ones. Thus, a new parameter-less predicated based ARM technique was proposed to address these limitations, which was enhanced to handle the frequent and rare items at the same time. Furthermore, a new interestingness measure, called g measure, was developed to select only highly interesting rules. In this proposed technique, interesting combinations were firstly selected by considering both the frequent and the rare items from a dataset. They were then mapped to the pseudo implications using predefined logical conditions. Later, inference rules were used to validate the pseudo-implications to discover rules within the set of mapped pseudo-implications. The resultant set of interesting rules was then referred to as the predicate based association rules. Zoo, breast cancer, and car evaluation datasets were used for conducting experiments. The results of the experiments were evaluated by its comparison with various classification techniques, traditional ARM technique and the coherent rule mining technique. The predicate-based rule mining approach gained an accuracy of 93.33%. In addition, the results of the g measure were compared with a state-of-the-art interestingness measure developed for a coherent rule mining technique called the h value. Predicate rules were discovered with an average confidence value of 0.754 for the zoo dataset and 0.949 for the breast cancer dataset, while the average confidence of the predicate rules found from the car evaluation dataset was 0.582. Results of this study showed that a set of interesting and highly reliable rules were discovered, including frequent, rare and negative association rules that have a higher confidence value. This research resulted in designing a methodology in rule mining which does not rely on the minsupp and minconf threshold. Also, a complete set of association rules are discovered by the proposed technique. Finally, the interestingness measure property for the selection of combinations from datasets makes it possible to reduce the exponential searching of the rules
Recommended from our members
Exploiting domain knowledge to enhance opinion mining using a hybrid semantic knowledgebase-machine learning approach
With the fast growth of World Wide Web 2.0, a great number of opinions about a variety of products have been published on blogs, forums, and social networks. Online opinions play an important role in supporting consumers make decisions about purchasing products or services. In addition, customer reviews allow companies to understand the strengths and limitations of their products and services, which aids in improving their marketing campaigns. The challenge is that online opinions are predominantly expressed in natural language text, and hence opinion mining tools are required to facilitate the effective analysis of opinions from the unstructured text and to allow for qualitative information extraction. This research presents a Hybrid Semantic Knowledgebase-Machine Learning approach for mining opinions at the domain feature level and classifying the overall opinion on a multi-point scale. The proposed approach benefits from the advantages of deploying a novel Semantic Knowledgebase approach to analyse a collection of reviews at the domain feature level and produce a set of structured information that associates the expressed opinions with specific domain features. The information in the knowledgebase is further supplemented with domain-relevant facts sourced from public Semantic datasets, and the enriched semantically-tagged information is then used to infer valuable semantic information about the domain as well as the expressed opinions on the domain features by summarising the overall opinions about the domain across multiple reviews, and by averaging the overall opinions about other cinematic features. The retrieved semantic information represents a valuable resource for training a Machine Learning classifier to predict the numerical rating of each review. Experimental evaluation revealed that the proposed Hybrid Semantic Knowledgebase-Machine Learning approach improved the precision and recall of the extracted domain features, and hence proved suitable for producing an enriched dataset of semantic features that resulted in higher classification accuracy