4,745 research outputs found

    Applications of Mining Arabic Text: A Review

    Get PDF
    Since the appearance of text mining, the Arabic language gained some interest in applying several text mining tasks over a text written in the Arabic language. There are several challenges faced by the researchers. These tasks include Arabic text summarization, which is one of the challenging open areas for research in natural language processing (NLP) and text mining fields, Arabic text categorization, and Arabic sentiment analysis. This chapter reviews some of the past and current researches and trends in these areas and some future challenges that need to be tackled. It also presents some case studies for two of the reviewed approaches

    Topic identification using filtering and rule generation algorithm for textual document

    Get PDF
    Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. However, not all frequent occurrence terms are relevant. The term extraction phase in topic identification method has resulted in extracted terms that might have similar meaning which is known as synonymy problem. Filtering and rule generation algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to identify topic for each verse based on the extracted terms. The PFA will process and filter each sentence based on nouns and predefined keywords to produce suitable terms for the topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse

    Tourism Companies Assessment via Social Media Using Sentiment Analysis

    Get PDF
    ازدادت وسائل التواصل الاجتماعي بشكل كبير وواضح لانها وسيلة إعلام للمستخدمين للتعبير عن مشاعرهم من خلال آلاف المنشورات والتعليقات حول شركات السياحة. وبالتالي ، يصعب على السائح قراءة جميع التعليقات لتحديد ما إذا كانت تلك الآراء إيجابية أم سلبية لتقييم نجاح الشركة. في هذه البحث,تم استخدام التنقيب عن النص لتصنيف المشاعر من خلال جمع مراجعات اللهجة العراقية حول شركات السياحة من الفيس بوك لتحليلها باستخدام تحليل المشاعر لتتبع المشاعر الموجوده في المنشورات والتعليقات. ثم تم تصنيفها إلى تعليق إيجابي أو سلبي أو محايد باستخدام Naïve Bayes, Rough Set Theory , K-Nearest Neighbor. من بين 71 شركة سياحة عراقية وجدت أن 28٪ من هذه الشركات لديها تقييم جيد جدا ، و 26٪ من هذه الشركات لديها تقييم جيد ، و 31٪ من هذه الشركات لديها تقييم متوسط ​​، و 4٪ من هذه الشركات لديها تقييم مقبول و 11٪ من هذه الشركات لديها تقييم سيء. ساعدت النتائج التجريبية الشركات على تحسين عملها وبرامجها واستجابة كافية وسريعة لمتطلبات العملاءIn recent years, social media has been increasing widely and obviously as a media for users expressing their emotions and feelings through thousands of posts and comments related to tourism companies. As a consequence, it became difficult for tourists to read all the comments to determine whether these opinions are positive or negative to assess the success of a tourism company. In this paper, a modest model is proposed to assess e-tourism companies using Iraqi dialect reviews collected from Facebook. The reviews are analyzed using text mining techniques for sentiment classification. The generated sentiment words are classified into positive, negative and neutral comments by utilizing Rough Set Theory, Naïve Bayes and K-Nearest Neighbor methods. After experimental results, it was determined that out of 71 tested Iraqi tourism companies, 28% from these companies have very good assessment, 26% from these companies have good assessment, 31% from these companies have medium assessment, 4% from these companies have acceptance assessment and 11% from these companies have bad assessment. These results helped the companies to improve their work and programs responding sufficiently and quickly to customer demands

    A Comparative Study of Text Classification Methods: An Experimental Approach

    Get PDF
    Text classification is the process in which text document is assigned to one or more predefined categories based on the contents of document. This paper focuses on experimentation of our implementation of three popular machine learning algorithms and their performance comparative evaluation on sample English Text document categorization. Three well known classifiers namely Naïve Bayes (NB), Centroid Based (CB) and K-Nearest Neighbor (KNN) were implemented and tested on same dataset R-52 chosen from Reuters-21578 corpus. For performance evaluation classical metrics like precision, recall and micro and macro F1-measures were used. For statistical comparison of the three classifiers Randomized Block Design method with T-test was applied. The experimental result exhibited that Centroid based classifier out performed with 97% Micro F1 measure. NB and KNN also produce satisfactory performance on the test dataset, with 91% Micro F1 measure and 89% Micro F1 measure respectively

    A Survey of Arabic Text Classification Models

    Get PDF
    There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries

    Arabic text classification methods: Systematic literature review of primary studies

    Get PDF
    Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed

    Recent Trends in Computational Intelligence

    Get PDF
    Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

    Developing conceptual glossaries for the Latin vulgate bible.

    Get PDF
    A conceptual glossary is a textual reference work that combines the features of a thesaurus and an index verborum. In it, the word occurrences within a given text are classified, disambiguated, and indexed according to their membership of a set of conceptual (i.e. semantic) fields. Since 1994, we have been working towards building a set of conceptual glossaries for the Latin Vulgate Bible. So far, we have published a conceptual glossary to the Gospel according to John and are at present completing the analysis of the Gospel according to Mark and the minor epistles. This paper describes the background to our project and outlines the steps by which the glossaries are developed within a relational database framework

    Developing resources for sentiment analysis of informal Arabic text in social media

    Get PDF
    Natural Language Processing (NLP) applications such as text categorization, machine translation, sentiment analysis, etc., need annotated corpora and lexicons to check quality and performance. This paper describes the development of resources for sentiment analysis specifically for Arabic text in social media. A distinctive feature of the corpora and lexicons developed are that they are determined from informal Arabic that does not conform to grammatical or spelling standards. We refer to Arabic social media content of this sort as Dialectal Arabic (DA) - informal Arabic originating from and potentially mixing a range of different individual dialects. The paper describes the process adopted for developing corpora and sentiment lexicons for sentiment analysis within different social media and their resulting characteristics. The addition to providing useful NLP data sets for Dialectal Arabic the work also contributes to understanding the approach to developing corpora and lexicons
    corecore