6,700 research outputs found

    Applications of Mining Arabic Text: A Review

    Get PDF
    Since the appearance of text mining, the Arabic language gained some interest in applying several text mining tasks over a text written in the Arabic language. There are several challenges faced by the researchers. These tasks include Arabic text summarization, which is one of the challenging open areas for research in natural language processing (NLP) and text mining fields, Arabic text categorization, and Arabic sentiment analysis. This chapter reviews some of the past and current researches and trends in these areas and some future challenges that need to be tackled. It also presents some case studies for two of the reviewed approaches

    Tourism Companies Assessment via Social Media Using Sentiment Analysis

    Get PDF
    ازدادت وسائل التواصل الاجتماعي بشكل كبير وواضح لانها وسيلة إعلام للمستخدمين للتعبير عن مشاعرهم من خلال آلاف المنشورات والتعليقات حول شركات السياحة. وبالتالي ، يصعب على السائح قراءة جميع التعليقات لتحديد ما إذا كانت تلك الآراء إيجابية أم سلبية لتقييم نجاح الشركة. في هذه البحث,تم استخدام التنقيب عن النص لتصنيف المشاعر من خلال جمع مراجعات اللهجة العراقية حول شركات السياحة من الفيس بوك لتحليلها باستخدام تحليل المشاعر لتتبع المشاعر الموجوده في المنشورات والتعليقات. ثم تم تصنيفها إلى تعليق إيجابي أو سلبي أو محايد باستخدام Naïve Bayes, Rough Set Theory , K-Nearest Neighbor. من بين 71 شركة سياحة عراقية وجدت أن 28٪ من هذه الشركات لديها تقييم جيد جدا ، و 26٪ من هذه الشركات لديها تقييم جيد ، و 31٪ من هذه الشركات لديها تقييم متوسط ​​، و 4٪ من هذه الشركات لديها تقييم مقبول و 11٪ من هذه الشركات لديها تقييم سيء. ساعدت النتائج التجريبية الشركات على تحسين عملها وبرامجها واستجابة كافية وسريعة لمتطلبات العملاءIn recent years, social media has been increasing widely and obviously as a media for users expressing their emotions and feelings through thousands of posts and comments related to tourism companies. As a consequence, it became difficult for tourists to read all the comments to determine whether these opinions are positive or negative to assess the success of a tourism company. In this paper, a modest model is proposed to assess e-tourism companies using Iraqi dialect reviews collected from Facebook. The reviews are analyzed using text mining techniques for sentiment classification. The generated sentiment words are classified into positive, negative and neutral comments by utilizing Rough Set Theory, Naïve Bayes and K-Nearest Neighbor methods. After experimental results, it was determined that out of 71 tested Iraqi tourism companies, 28% from these companies have very good assessment, 26% from these companies have good assessment, 31% from these companies have medium assessment, 4% from these companies have acceptance assessment and 11% from these companies have bad assessment. These results helped the companies to improve their work and programs responding sufficiently and quickly to customer demands

    Classification Arabic Twitter User’s Insights Using Rough Set Theory

    Get PDF
    Nowadays, people using social media from around the world to share their daily affairs. Arabic twitter for example is a platform where users read, reply, post which known ‘tweets’. Users trading their opinions on different trends that are not equal in important and differed based on their power and interest. Tweets can provide rich information to make decision. The main objective of this paper is to present a framework for making a valuable decision through analyzing social users' insights based on their proximity to a particular trend with highlights their power in this trend. Tweets are exceedingly unstructured that makes it difficult to analyze. Nevertheless, our proposed model differs from previous research in this field it gathered the use of supervised and unsupervised machine learning algorithms. The process of performing this work as follows: classifying users based on the degree of their closeness/interest utilizing Mendelow’s power/interest matrix, rough set theory to eliminate the features that may be found in user profiles to find minimal sets of data. The proposed model applied two attribute reduction algorithms on our dataset to determine the optimal number of reducts for improving decision making from the user replies. In addition to, unsupervised machine learning to group their replies into subcategories such as positive, negative, or neutral. The experimental evaluation shows that Johnson algorithm has reduced the user attributes by 71% than genetic algorithm that utilized in a classification model

    Recent Trends in Computational Intelligence

    Get PDF
    Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

    An improved Arabic text classification method using word embedding

    Get PDF
    Feature selection (FS) is a widely used method for removing redundant or irrelevant features to improve classification accuracy and decrease the model’s computational cost. In this paper, we present an improved method (referred to hereafter as RARF) for Arabic text classification (ATC) that employs the term frequency-inverse document frequency (TF-IDF) and Word2Vec embedding technique to identify words that have a particular semantic relationship. In addition, we have compared our method with four benchmark FS methods namely principal component analysis (PCA), linear discriminant analysis (LDA), chi-square, and mutual information (MI). Support vector machine (SVM), k-nearest neighbors (K-NN), and naive Bayes (NB) are three machine learning based algorithms used in this work. Two different Arabic datasets are utilized to perform a comparative analysis of these algorithms. This paper also evaluates the efficiency of our method for ATC on the basis of performance metrics viz accuracy, precision, recall, and F-measure. Results revealed that the highest accuracy achieved for the SVM classifier applied to the Khaleej-2004 Arabic dataset with 94.75%, while the same classifier recorded an accuracy of 94.01% for the Watan-2004 Arabic dataset

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining

    Applications of Nature-Inspired Algorithms for Dimension Reduction: Enabling Efficient Data Analytics

    Get PDF
    In [1], we have explored the theoretical aspects of feature selection and evolutionary algorithms. In this chapter, we focus on optimization algorithms for enhancing data analytic process, i.e., we propose to explore applications of nature-inspired algorithms in data science. Feature selection optimization is a hybrid approach leveraging feature selection techniques and evolutionary algorithms process to optimize the selected features. Prior works solve this problem iteratively to converge to an optimal feature subset. Feature selection optimization is a non-specific domain approach. Data scientists mainly attempt to find an advanced way to analyze data n with high computational efficiency and low time complexity, leading to efficient data analytics. Thus, by increasing generated/measured/sensed data from various sources, analysis, manipulation and illustration of data grow exponentially. Due to the large scale data sets, Curse of dimensionality (CoD) is one of the NP-hard problems in data science. Hence, several efforts have been focused on leveraging evolutionary algorithms (EAs) to address the complex issues in large scale data analytics problems. Dimension reduction, together with EAs, lends itself to solve CoD and solve complex problems, in terms of time complexity, efficiently. In this chapter, we first provide a brief overview of previous studies that focused on solving CoD using feature extraction optimization process. We then discuss practical examples of research studies are successfully tackled some application domains, such as image processing, sentiment analysis, network traffics / anomalies analysis, credit score analysis and other benchmark functions/data sets analysis

    Optimal feature selection for learning-based algorithms for sentiment classification

    Get PDF
    Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment classification performance of the learning-based methods. Therefore, we investigate the relationship between the number of features selected and the sentiment classification performance of the learning-based methods. A new method for the selection of a suitable number of features is proposed in which the Chi Square feature selection algorithm is employed and the features are selected using a preset score threshold. It is discovered that there is a relationship between the logarithm of the number of features selected and the sentiment classification performance of the learning-based method, and it is also found that this relationship is independent of the learning-based method involved. The new findings in this research indicate that it is always possible for researchers to select the appropriate number of features for learning-based methods to obtain the best sentiment classification performance. This can guide researchers to select the proper features for optimizing the performance of learning-based algorithms. (A preliminary version of this paper received a Best Paper Award at the International Conference on Extreme Learning Machines 2018.)Accepted versio

    Topic identification using filtering and rule generation algorithm for textual document

    Get PDF
    Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. However, not all frequent occurrence terms are relevant. The term extraction phase in topic identification method has resulted in extracted terms that might have similar meaning which is known as synonymy problem. Filtering and rule generation algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to identify topic for each verse based on the extracted terms. The PFA will process and filter each sentence based on nouns and predefined keywords to produce suitable terms for the topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse
    corecore