1,296 research outputs found

    Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

    Get PDF
    The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models

    Detecting Deceptive Opinions: Intra and Cross-domain Classification using an Efficient Representation

    Full text link
    Electronic versíon of an article published as International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 25, 2, 2017, 151-174. DOI:10.1142/S0218488517400165 © copyright World Scientific Publishing Company. https://www.worldscientific.com/worldscinet/ijufks[EN] Online opinions play an important role for customers and companies because of the increasing use they do to make purchase and business decisions. A consequence of that is the growing tendency to post fake reviews in order to change purchase decisions and opinions about products and services. Therefore, it is really important to filter out deceptive comments from the retrieved opinions. In this paper we propose the character n-grams in tokens, an efficient and effective variant of the traditional character n-grams model, which we use to obtain a low dimensionality representation of opinions. A Support Vector Machines classifier was used to evaluate our proposal on available corpora with reviews of hotels, doctors and restaurants. In order to study the performance of our model, we make experiments with intra and cross-domain cases. The aim of the latter experiment is to evaluate our approach in a realistic cross-domain scenario where deceptive opinions are available in a domain but not in another one. After comparing our method with state-of-the-art ones we may conclude that using character n-grams in tokens allows to obtain competitive results with a low dimensionality representation.This publication was made possible by NPRP grant #9-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Cagnina, L.; Rosso, P. (2017). Detecting Deceptive Opinions: Intra and Cross-domain Classification using an Efficient Representation. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems. 25(2):151-174. https://doi.org/10.1142/S0218488517400165S151174252Cambria, E. (2016). Affective Computing and Sentiment Analysis. IEEE Intelligent Systems, 31(2), 102-107. doi:10.1109/mis.2016.31Cambria, E., & Hussain, A. (2015). Sentic Computing. Cognitive Computation, 7(2), 183-185. doi:10.1007/s12559-015-9325-0Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10-18. doi:10.1145/1656274.1656278Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. (2007). On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication. Discourse Processes, 45(1), 1-23. doi:10.1080/01638530701739181Hernández Fusilier, D., Montes-y-Gómez, M., Rosso, P., & Guzmán Cabrera, R. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management, 51(4), 433-443. doi:10.1016/j.ipm.2014.11.001Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50-60. doi:10.1214/aoms/1177730491MONTAÑÉS, E., QUEVEDO, J. R., COMBARRO, E. F., DÍAZ, I., & RANILLA, J. (2007). A HYBRID FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 15(02), 133-151. doi:10.1142/s0218488507004492Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying Words: Predicting Deception from Linguistic Styles. Personality and Social Psychology Bulletin, 29(5), 665-675. doi:10.1177/0146167203029005010Raudys, S. J., & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3), 252-264. doi:10.1109/34.75512Wang, G., Xie, S., Liu, B., & Yu, P. S. (2012). Identify Online Store Review Spammers via Social Review Graph. ACM Transactions on Intelligent Systems and Technology, 3(4), 1-21. doi:10.1145/2337542.2337546Webb, G. I. (2000). Machine Learning, 40(2), 159-196. doi:10.1023/a:100765951484

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly

    Data-Efficient Machine Learning with Focus on Transfer Learning

    Get PDF
    Machine learning (ML) has attracted a significant amount of attention from the artifi- cial intelligence community. ML has shown state-of-art performance in various fields, such as signal processing, healthcare system, and natural language processing (NLP). However, most conventional ML algorithms suffer from three significant difficulties: 1) insufficient high-quality training data, 2) costly training process, and 3) domain dis- crepancy. Therefore, it is important to develop solutions for these problems, so the future of ML will be more sustainable. Recently, a new concept, data-efficient ma- chine learning (DEML), has been proposed to deal with the current bottlenecks of ML. Moreover, transfer learning (TL) has been considered as an effective solution to address the three shortcomings of conventional ML. Furthermore, TL is one of the most active areas in the DEML. Over the past ten years, significant progress has been made in TL. In this dissertation, I propose to address the three problems by developing a software- oriented framework and TL algorithms. Firstly, I introduce a DEML framework and a evaluation system. Moreover, I present two novel TL algorithms and applications on real-world problems. Furthermore, I will first present the first well-defined DEML framework and introduce how it can address the challenges in ML. After that, I will give an updated overview of the state-of-the-art and open challenges in the TL. I will then introduce two novel algorithms for two of the most challenging TL topics: distant domain TL and cross-modality TL (image-text). A detailed algorithm introduction and preliminary results on real-world applications (Covid-19 diagnosis and image clas- sification) will be presented. Then, I will discuss the current trends in TL algorithms and real-world applications. Lastly, I will present the conclusion and future research directions
    corecore