9 research outputs found
Applications of Mining Arabic Text: A Review
Since the appearance of text mining, the Arabic language gained some interest in applying several text mining tasks over a text written in the Arabic language. There are several challenges faced by the researchers. These tasks include Arabic text summarization, which is one of the challenging open areas for research in natural language processing (NLP) and text mining fields, Arabic text categorization, and Arabic sentiment analysis. This chapter reviews some of the past and current researches and trends in these areas and some future challenges that need to be tackled. It also presents some case studies for two of the reviewed approaches
Attribute Set Weighting and Decomposition Approaches for Reduct Computation
This research is mainly in the Rough Set theory based knowledge reduction for data classification within the data mining framework. To facilitate the Rough Set based
classification, two main knowledge reduction models are proposed. The first model is an approximate approach for object reducts computation used particularly for the
data classification purposes. This approach emphasizes on assigning weights for each attribute in the attributes set. The weights give indication for the importance of an
attribute to be considered in the reduct. This proposed approach is named Object Reduct by Attribute Weighting (ORAW). A variation of this approach is proposed to
compute full reduct and named Full Reduct by Attribute Weighting (FRAW).The second proposed approach deals with large datasets particularly with large number of attributes. This approach utilizes the principle of incremental attribute set decomposition to generate an approximate reduct to represent the entire dataset. This
proposed approach is termed for Reduct by Attribute Set Decomposition (RASD).The proposed reduct computation approaches are extensively experimented and
evaluated. The evaluation is mainly in two folds: first is to evaluate the proposed
approaches as Rough Set based methods where the classification accuracy is used as
an evaluation measure. The well known IO-fold cross validation method is used to
estimate the classification accuracy. The second fold is to evaluate the approaches as
knowledge reduction methods where the size of the reduct is used as a reduction
measure. The approaches are compared to other reduct computation methods and to other none Rough Set based classification methods. The proposed approaches are applied to various standard domains datasets from the UCI repository. The results of the experiments showed a very good performance for the proposed approaches as classification methods and as knowledge reduction methods. The accuracy of the ORAW approach outperformed the Johnson approach over all the datasets. It also produces better accuracy over the Exhaustive and the Standard Integer Programming (SIP) approaches for the majority of the datasets used in the experiments. For the RASD approach, it is compared to other classification methods and it shows very competitive results in term of classification accuracy and reducts size. As a conclusion, the proposed approaches have shown competitive and even better accuracy in most tested domains. The experiment results indicate that the proposed approaches as Rough classifiers give good performance across different classification problems and they can be promising methods in solving classification problems. Moreover, the experiments proved that the incremental vertical decomposition framework is an appealing method for knowledge reduction over large datasets within the framework of Rough Set based classification
Bayes model for assessing the reading difficulty of English text for English education in Jordan
Predicting the reading difficulty level of English texts is a critical process for second language education and assessment. Reading difficulty level is concerned with the problem of matching a reader’s proficiency and the appropriate text. The reading difficulty level or readability assessment is the process for predicting the reading grade level required from an input text or document, which corresponds to the reader and to the materials. Students in Jordan at their academic levels find obstacles in finding relevant readable data for any subject at their levels. This paper is intended to introduce a model that foretells the reading difficulty level of a given text in terms of a student's ability to read and understand English as a non-native English speaker in Jordanian schools. In this paper, Jordanian students were classified into four categories according to their knowledge of English. The prediction of the reading difficulty level is achieved by using a modern statistical model that is situated on the Bayes model. The model compares the given text with some standard predefined text that strongly reflects the ability to read and understand English text. The accuracy of the proposed model was tested using the hold-out method. The overall prediction accuracy was 75.9%
Automatic extraction of ontological relations from Arabic text
Automatic extraction of semantic relationships among Arabic concepts to formulate ontology models is crucial for providing rich semantic metadata. Due to the annual increase of Arabic content on the Internet, the need for specialized tools to analyze and understand Arabic text has emerged. This research proposes a methodology that extracts ontological relationships. The goals of this research are: to extract semantic features of Arabic text, propose syntactic patterns of relationships among concepts, and propose a formal model of extracting ontological relations.
The proposed methodology has been designed to analyze Arabic text using lexical semantic patterns of the Arabic language according to a set of features. Next, the features have been abstracted and enriched with formal descriptions for the purpose of generalizing the resulted rules. The rules, then, have formulated a classifier that accepts Arabic text, analyzes it, and then displays related concepts labeled with its designated relationship. Moreover, to resolve the ambiguity of homonyms, a set of machine translation, text mining, and part of speech tagging algorithms have been reused. We performed extensive experiments to measure the effectiveness of our proposed tools. The results indicate that our proposed methodology is promising for automating the process of extracting ontological relations
A New Ontology-Based Method for Arabic Sentiment Analysis
Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for Arabic sentiment analysis based on domain ontologies and features’ importance. In this paper, we built a semantic orientation approach for calculating overall polarity from the Arabic subjective texts based on built domain ontology and the available sentiment lexicon. We used the ontology concepts to extract and weight the semantic domain features by considering their levels in the ontology tree and their frequencies in the dataset to compute the overall polarity of a given textual review based on the importance of each domain feature. For evaluation, an Arabic dataset from the hotels’ domain was selected to build the domain ontology and to test the proposed approach. The overall accuracy and f-measure reach 79.20% and 78.75%, respectively. Results showed that the approach outperformed the other semantic orientation approaches, and it is an appealing approach to be used for Arabic sentiment analysis