Search CORE

8 research outputs found

Towards Corpus-Based Stemming for Arabic Texts

Author: Sabtan Y. (Yasser Muhammad Naguib )
Publication venue: 'Al-Kindi Center for Research and Development'
Publication date: 01/11/2018
Field of study

Stemming is an essential processing step in a number of natural language processing (NLP) applications such as information extraction, text analysis and machine translation. It is the process of reducing words to their stems. This paper presents a light stemmer for Arabic, using a corpus-based approach. The stemmer groups morphological variants of words in an Arabic corpus based on shared characters, before stripping off their affixes (prefixes and suffixes) to produce their common stem. Experimental results show that 86% of words in the test set were correctly grouped under a similar reduced form (i.e. the possible stem). In some cases the reduced form is not the legitimate stem. The evaluation shows that 72.2% of the words in the test set were reduced to their legitimate stem. The current stemmer is developed with the future aim of investigating the effectiveness of using word stems for extracting bilingual equivalents from an Arabic-English parallel corpus

International Journal of Linguistics, Literature and Translation

Towards Corpus-Based Stemming for Arabic Texts

Author: Sabtan Y. (Yasser Muhammad Naguib )
Publication venue: 'Al-Kindi Center for Research and Development'
Publication date: 01/11/2018
Field of study

Stemming is an essential processing step in a number of natural language processing (NLP) applications such as information extraction, text analysis and machine translation. It is the process of reducing words to their stems. This paper presents a light stemmer for Arabic, using a corpus-based approach. The stemmer groups morphological variants of words in an Arabic corpus based on shared characters, before stripping off their affixes (prefixes and suffixes) to produce their common stem. Experimental results show that 86% of words in the test set were correctly grouped under a similar reduced form (i.e. the possible stem). In some cases the reduced form is not the legitimate stem. The evaluation shows that 72.2% of the words in the test set were reduced to their legitimate stem. The current stemmer is developed with the future aim of investigating the effectiveness of using word stems for extracting bilingual equivalents from an Arabic-English parallel corpus

International Journal of Linguistics, Literature and Translation

Bilingual Lexicon Extraction from Arabic-English Parallel Corpora with a View to Machine Translation

Author: A Kumano
A Ramsay
A Sharaf
D Fi?er
D Tufis
E M Badawi
E Morin
H Abdul-Raof
H Kaji
H Mubarak
I A Mel&apos
I D Melamed
I D Melamed
I Dagan
I M Saleh
J Tiedemann
L Burnard
M A Attia
M Alabbas
M M Ghali
M Maamouri
P F Brown
P G Otero
P G Otero
P Resnik
R Hudson
W A Gale
X Gutierrez-Vasques
Y Sabtan
Y Sabtan
Yasser Muhammad Naguib Sabtan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Economic potential of brines of Sabkha Jayb Uwayyid, Eastern Saudi Arabia

Author: A James
AA Sabtan
AA Sabtan
Abdulaziz Al-Shaibani
AI Al-Amoud
DH Johnson
DJ Shearman
E Gavish
FB Phleger
FE Jones
HK Abdel-Aal
HK Abdel-Aal
HS Edgell
JK Warren
N Bottomley
NA Alsaaran
OSB Al-Amoudi
R Curtis
RL Dam Van
SG Fryberger
SW Tyler
W Backiewicz
WE Sanford
WW Wood
WW Wood
WW Wood
Y Yechieli
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Earth Fissures in Wadi Najran, Kingdom of Saudi Arabia

Author: AA Amin
AA Amin
Abdullah A. Sabtan
Ahmed M. Youssef
DC Helm
H Bouwer
H Sun
HL Vacher
M Sophocleous
Norbert H. Maerz
P Sahu
SM Mousavi
T Holzer
W Lund
Y Zhang
Yasser A. Zabramawi
YS Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Large-Scale Model Swelling Potential of Expansive Soils in Comparison with Oedometer Swelling Methods

Author: A Djedid
AA Bahabri
AA Sabtan
AG Zhemchuzhnikov
AT Sudjianto
BH Rao
BM Das
BVV Reddy
DE McCormack
H Tu
HB Seed
HH Adem
HJ Pincus
J Pruska
J Yurkiewicz
JD Nelson
JM Utts
N Davies
NV Nayak
PC Kariuki
S Asuri
Y Erzin
Y Mawlood
Z Han
Ö Çimen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Unsaturated characteristics of undisturbed expansive shale from Saudi Arabia

Author: A Benchouk
AA Sabtan
AI Al-Mhaidib
AM Brown
AS El-Hames
CA Burger
CA Burger
D Mallants
D Mc Garry
DG Fredlund
DG Fredlund
DG Fredlund
DG Fredlund
FH Chen
G Abderahim
GF Gitirana
GV Wilson
GW Wilson
H Nowamooz
H Peron
JK Mitchell
KRJ Smettem
L Miao
MA Dafalla
MA Dawoud
PH Simms
S Azam
S Taibi
SA Aiban
SL Barbour
SN Abduljauwad
SN Abduljauwad
Tamer Y. Elkady
W Durner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Salt marsh vegetation promotes efficient tidal channel networks

Author: A Al-Farraj
A Al-Hurban
A D’Alpaos
A Rinaldo
A Sabtan
A Saleh
AB Murray
AC Redfield
B van Maanen
B van Maanen
BM Vlaswinkel
C Schwarz
CA Braudrick
CA Wilson
CM Duarte
D Garofalo
E Mcleod
EA Shinn
EB Barbier
ED Lazarus
EJ Gabet
EJ Gabet
EP Glenn
GE Tucker
GM King
I Möller
IA Mendelssohn
J Allen
J-P Belliard
JW Kirchner
L Lam
L Stefanon
M Marani
M Marani
M Marani
M Tal
MG Kleinhans
OSB Al-Amoudi
R Lakhdar
R Marciano
RE Horton
RW Thompson
S Fagherazzi
S Fagherazzi
S Temmerman
S Temmerman
S Temmerman
SA Schumm
T Bouma
W Vandenbruwaene
WE Dietrich
Y Yechieli
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

core

core