Search CORE

4 research outputs found

PARALLEL CREATION OF GIGAWORD CORPORA FOR MEDIUM DENSITY LANGUAGES: AN INTERIM REPORT

Author: Halácsy Péter
Kornai András
NEMETH P
Varga Dániel
Publication venue
Publication date: 01/01/2008
Field of study

For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN * toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated. 1

CiteSeerX

SZTAKI Publication Repository

The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

Author: Baroni Paola
Bel N?ria
Budin Gerhard
Calzolari Nicoletta
Choukri Khalid
Goggi Sara
Mariani Joseph
Monachini Monica
Odijk Jan
Piperidis Stelios
Quochi Valeria
Soria Claudia
Toral Antonio
Publication venue: Istituto di Linguistica Computazionale del CNR - Pisa, ITALY
Publication date
Field of study

Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009

PUblication MAnagement

Kabul Times, May 1975

Author: Afghanistan
Publication venue: DigitalCommons@UNO
Publication date: 01/05/1975
Field of study

Kabul Times, May 1975 *This is a large file and may take a couple of minutes to download

The University of Nebraska, Omaha

Extraction of Arabic word roots: An Approach Based on Computational Model and Multi-Backpropagation Neural Networks

Author: Al-Serhan Hasan Muaidi
Publication venue: 'De Montfort University'
Publication date: 01/01/2008
Field of study

Stemming is a process of extracting the root of a given word, by stripping off the affixes attached to this word. Many attempts have been made to address the stemming of Arabic words problem. The majority of the existing Arabic stemming algorithms require a complete set of morphological rules and large vocabulary lookup tables. Furthermore, many of them give more than one potential stem or root for a given Arabic word. According to Ahmad [11], the Arabic stemming process based on the language morphological rules is still a very difficult task due to the nature of the language itself. The limitations of the current Arabic stemming methods have motivated this research in which we investigate a novel approach to extract the word roots of Arabic language named here as MUAIDI-STEMMER 2. This approach attempts to exploit numerical relations between Arabic letters, avoiding having a list of the root and pattern of each word in the language, and giving one root solution. This approach is composed of two phases. Phase I depends on a basic calculations extracted from linguistic analysis of Arabic patterns and affixes. Phase II is based on artificial neural network trained by backpropagation learning rule. In this proposed phase, we formulate the root extraction problem as a classification problem and the neural network as a classifier tool. This study demonstrates that a neural network can be effectively used to ex- tract the word roots of Arabic language The stemmer developed is tested using 46,895 Arabic word types3. Error counting accuracy evaluation was employed to evaluate the performance of the stemmer. It was successful in producing the stems of 44,107 Arabic words from the given test datasets with accuracy of 94.81%. 2.Muaidi is the author father's name. 3.Types mean distinct or unique words

De Montfort University Open Research Archive