4 research outputs found
PARALLEL CREATION OF GIGAWORD CORPORA FOR MEDIUM DENSITY LANGUAGES: AN INTERIM REPORT
For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN * toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated. 1
The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe
Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009
Kabul Times, May 1975
Kabul Times, May 1975
*This is a large file and may take a couple of minutes to download
Extraction of Arabic word roots: An Approach Based on Computational Model and Multi-Backpropagation Neural Networks
Stemming is a process of extracting the root of a given word, by stripping
off the affixes attached to this word. Many attempts have been made
to address the stemming of Arabic words problem. The majority of the
existing Arabic stemming algorithms require a complete set of morphological
rules and large vocabulary lookup tables. Furthermore, many of them give
more than one potential stem or root for a given Arabic word. According to
Ahmad [11], the Arabic stemming process based on the language morphological
rules is still a very difficult task due to the nature of the language itself.
The limitations of the current Arabic stemming methods have motivated this
research in which we investigate a novel approach to extract the word roots
of Arabic language named here as MUAIDI-STEMMER 2. This approach attempts
to exploit numerical relations between Arabic letters, avoiding having a list
of the root and pattern of each word in the language, and giving one root solution.
This approach is composed of two phases. Phase I depends on a basic
calculations extracted from linguistic analysis of Arabic patterns and affixes.
Phase II is based on artificial neural network trained by backpropagation
learning rule. In this proposed phase, we formulate the root extraction problem
as a classification problem and the neural network as a classifier tool.
This study demonstrates that a neural network can be effectively used to ex- tract the word roots of Arabic language
The stemmer developed is tested using 46,895 Arabic word types3. Error counting accuracy evaluation was employed to evaluate the performance of
the stemmer. It was successful in producing the stems of 44,107 Arabic words
from the given test datasets with accuracy of 94.81%.
2.Muaidi is the author father's name.
3.Types mean distinct or unique words