4,469 research outputs found
KACST Arabic Text Classification Project: Overview and Preliminary Results
Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques
Arabic Text Mining
The rapid growth of the internet has increased the number of online texts.
This led to the rapid growth of the number of online texts in the Arabic
language. The enormous amount of text must be organized into classes to make
the analysis process and text retrieval easier. Text classification is,
therefore, a key component of text mining. There are numerous systems and
approaches for categorizing literature in English, European (French, German,
Spanish), and Asian (Chinese, Japanese). In contrast, there are relatively few
studies on categorizing Arabic literature due to the difficulty of the Arabic
language. In this work, a brief explanation of key ideas relevant to Arabic
text mining are introduced then a new classification system for the Arabic
language is presented using light stemming and Classifier Na\"ive Bayesian
(CNB). Texts from two classes: politics and sports, are included in our corpus.
Some texts are added to the system, and the system correctly classified them,
demonstrating the effectiveness of the system
- …