Search CORE

21,304 research outputs found

An Intelligent System For Arabic Text Categorization

Author: Fayed Z.T.
Habib M.B.
Syiam M.M.
Publication venue: Faculty of Computers and Information Sciences, Ain Shams University
Publication date: 01/01/2006
Field of study

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process. Experiments are performed over self collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language. The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme. Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process. The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98%

Maastricht University Research Portal

University of Twente Research Information

Recommended from our members

Proposal for Arabic Mathematical Alphabetic Symbols

Author: Allawi Adil
Anderson Deborah
Lazrek Azzeddine
Pournader Roozbeh
Sargent Murray
Publication venue: eScholarship, University of California
Publication date: 01/08/2010
Field of study

This is a proposal to encode a set of characters used to write Arabic mathematical notation in the international character encoding standard Unicode. This set of characters was published in Unicode Standard version 6.1 in January 2012

eScholarship - University of California

Early texts on Hindu-Arabic calculation

Author: Folkerts Menso
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2011
Field of study

This article describes how the decimal place value system was transmitted from India via the Arabs to the West up to the end of the fifteenth century. The arithmetical work of al-Khw¯arizm¯ı’s, ca. 825, is the oldest Arabic work on Indian arithmetic of which we have detailed knowledge. There is no known Arabic manuscript of this work; our knowledge of it is based on an early reworking of a Latin translation. Until some years ago, only one fragmentary manuscript of this twelfth-century reworking was known (Cambridge, UL, Ii.6.5). Another manuscript that transmits the complete text (New York, Hispanic Society of America, HC 397/726) has made possible a more exact study of al-Khw¯arizm¯ı’s work. This article gives an outline of this manuscript’s contents and discusses some characteristics of its presentation

Crossref

Open Access LMU

Component-based Segmentation of words from handwritten Arabic text

Author: AlKhateeb J. H.
Ipson S.
Jiang J.
Ren Jinchang
Publication venue
Publication date: 28/05/2008
Field of study

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition

University of Strathclyde Institutional Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Study for the Necessity of Risk Assessment for Heavy metal Pollution in the Barada Basin, Syria

Author: Rimah Melhem
Yoshiro Higano
Publication venue
Publication date
Field of study

Manufacturing industries are blooming rapidly in Barada Basin carrying high risk to environment and human health due to generating huge amounts of heavy metals to environmental media, particularly rivers. Few studies show that concentrations of chromium, cadmium, and lead exceed the standards in down streams of rivers. Risk assessment on human health urges for immediate measures to control the emission of these metals. In this paper, we discuss the implementation of comprehensive risk management policy as an element to introduce an optimal policy to reduce water pollution and improve water quality in the Barada Basin.

Research Papers in Economics

Recommended from our members

Rumi Numeral System Symbols: Additional characters proposed to Unicode

Author: Lazrek Azzeddine
Publication venue: eScholarship, University of California
Publication date: 30/03/2006
Field of study

This is a proposal to encode a set of Rumi numeral symbols in the international character encoding standard Unicode. This set of characters was published in Unicode Standard version 5.2 in October 2009. Rumi, which is also known today as Fasi, is a numeric system used from the 10c to 17c across a wide area, spanning the area from Egypt, the Maghreb, to al-Andalus on the Iberian Peninsula. Rumi numeral symbols originate from the Coptic or Greek-Coptic tradition

eScholarship - University of California