7 research outputs found

    Arabic text classification methods: Systematic literature review of primary studies

    Get PDF
    Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed

    The classification of the modern arabic poetry using machine learning

    Get PDF
    In recent years, working on text classification and analysis of Arabic texts using machine learning has seen some progress, but most of this research has not focused on Arabic poetry. Because of some difficulties in the analysis of Arabic poetry, it was required the use of standard Arabic language on which “Al Arud”, the science of studying poetry is based. This paper presents an approach that uses machine learning for the classification of modern Arabic poetry into four types: love poems, Islamic poems, social poems, and political poems. Each of these species usually has features that indicate the class of the poem. Despite the challenges generated by the difficulty of the rules of the Arabic language on which this classification depends, we proposed a new automatic way of modern Arabic poems classification to solve these issues. The recommended method is suitable for the above-mentioned classes of poems. This study used Naïve Bayes, Support Vector Machines, and Linear Support Vector for the classification processes. Data preprocessing was an important step of the approach in this paper, as it increased the accuracy of the classification

    Utilizing artificial bee colony algorithm as feature selection method in Arabic text classification

    Get PDF
    A huge amount of crucial information is contained in documents. The vast increase in the number of E documents available for user access makes the utilization of automated text classification es sential. Classifying or arranging documents into predefined group s is called Text classification Feature Selection (is needed for minimizing the dimensionality of high dimensional data and extracting only the features that are most pertinent to a particular task. One of the widely used algorithms for feature selection in text classification is the Evolutionary algorithm . In this paper, the filter method chi square and the Artificial Bee Colony) ABC algorithm were both used as FS methods . The chi square method is a useful technique for reducing the number of features and removing those that are superfluous or redundant. The ABC technique considers the chi square methods' chosen features as viable solutions (food sources). The ABC algorithm searches for the most efficient selection of features that increase classification performance. Support Vector Machine and Naïve Bayes classifiers were used as a fitness function for the ABC algorithm. The experiment result s demonstrated that the proposed feature selection method was able of decreasing the number of features by approximately 89.5%, and 94%, respectively when NB and SVM were used as fitness functions in comparison to the original dataset, while also enhancing classification performance

    Detecting Spam Email With Machine Learning Optimized With Bio-Inspired Metaheuristic Algorithms

    Get PDF
    Electronic mail has eased communication methods for many organisations as well as individuals. This method is exploited for fraudulent gain by spammers through sending unsolicited emails. This article aims to present a method for detection of spam emails with machine learning algorithms that are optimized with bio-inspired methods. A literature review is carried to explore the efficient methods applied on different datasets to achieve good results. An extensive research was done to implement machine learning models using Naïve Bayes, Support Vector Machine, Random Forest, Decision Tree and Multi-Layer Perceptron on seven different email datasets, along with feature extraction and pre-processing. The bio-inspired algorithms like Particle Swarm Optimization and Genetic Algorithm were implemented to optimize the performance of classifiers. Multinomial Naïve Bayes with Genetic Algorithm performed the best overall. The comparison of our results with other machine learning and bio-inspired models to show the best suitable model is also discussed

    Mining Twitter for crisis management: realtime floods detection in the Arabian Peninsula

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of doctor of Philosophy.In recent years, large amounts of data have been made available on microblog platforms such as Twitter, however, it is difficult to filter and extract information and knowledge from such data because of the high volume, including noisy data. On Twitter, the general public are able to report real-world events such as floods in real time, and act as social sensors. Consequently, it is beneficial to have a method that can detect flood events automatically in real time to help governmental authorities, such as crisis management authorities, to detect the event and make decisions during the early stages of the event. This thesis proposes a real time flood detection system by mining Arabic Tweets using machine learning and data mining techniques. The proposed system comprises five main components: data collection, pre-processing, flooding event extract, location inferring, location named entity link, and flooding event visualisation. An effective method of flood detection from Arabic tweets is presented and evaluated by using supervised learning techniques. Furthermore, this work presents a location named entity inferring method based on the Learning to Search method, the results show that the proposed method outperformed the existing systems with significantly higher accuracy in tasks of inferring flood locations from tweets which are written in colloquial Arabic. For the location named entity link, a method has been designed by utilising Google API services as a knowledge base to extract accurate geocode coordinates that are associated with location named entities mentioned in tweets. The results show that the proposed location link method locate 56.8% of tweets with a distance range of 0 – 10 km from the actual location. Further analysis has shown that the accuracy in locating tweets in an actual city and region are 78.9% and 84.2% respectively
    corecore