Search CORE

570 research outputs found

Arabic Information Retrieval: A Relevancy Assessment Survey

Author: Ababneh Ahmad
Lu Joan
Xu Qiang
Publication venue: 'Omer Halisdemir Universitesi Iktisadi ve Idari Bilimler Fakultesi Dergisi'
Publication date: 01/01/2016
Field of study

The paper presents a research in Arabic Information Retrieval (IR). It surveys the impact of statistical and morphological analysis of Arabic text in improving Arabic IR relevancy. We investigated the contributions of Stemming, Indexing, Query Expansion, Text Summarization (TS), Text Translation, and Named Entity Recognition (NER) in enhancing the relevancy of Arabic IR. Our survey emphasizing on the quantitative relevancy measurements provided in the surveyed publications. The paper shows that the researchers achieved significant enhancements especially in building accurate stemmers, with accuracy reaches 97%, and in measuring the impact of different indexing strategies. Query expansion and Text Translation showed positive relevancy effect. However, other tasks such as NER and TS still need more research to realize their impact on Arabic IR

University of Huddersfield Repository

AIS Electronic Library (AISeL)

Huddersfield Research Portal

Answering English queries in automatically transcribed Arabic speech

Author: Nwesri A
Scholer F
Tahaghoghi S
Publication venue: IEEE (USA)
Publication date: 01/01/2007
Field of study

There are several well-known approaches to parsing Arabic text in preparation for indexing and retrieval. Techniques such as stemming and stopping have been shown to improve search results on written newswire dispatches, but few comparisons are available on other data sources. In this paper, we apply several alternative stemming and stopping approaches to Arabic text automatically extracted from the audio soundtrack of news video footage, and compare these with approaches that rely on machine translation of the underlying text. Using the TRECVID video collection and queries, we show that normalisation, stopword- removal, and light stemming increase retrieval precision, but that heavy stemming and trigrams have a negative effect. We also show that the choice of machine translation engine plays a major role in retrieval effectiveness

RMIT Research Repository

Development of Arabic Information Retrieval Systems in the 21st Century

Author: Elmekawi Awatif
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/03/2018
Field of study

The present study deals with the development of Arabic Information Retrieval Systems starting from 2000, its vital role in the Text Retrieval Conference (TREC), and in the cross-language information retrieval track. It has overviewed the developments concerning the Holy Qur'an, Arabic language, terms relevant to Arabic information retrieval systems, and the characteristics of the Arabic language compared with other languages since the early 21st century. These developments include rich resources of up to date information so as to develop research in this area, modern developments in assessing and measuring Arabic information retrieval systems, relevant theses, and some research studies of contemporary universities on the use of TREC in Arabic information retrieval, and the researchers with no prior knowledge of Arabic language. The study ends with some studies of the Arab universities. Keywords: Retrieval Systems, Arabic Information, Twenty- first centur

International Institute for Science, Technology and Education (IISTE): E-Journals

Effectiveness of query expansion in searching the Holy Quran

Author: El-Haj Mahmoud
Hammo Bassam
Sleit Azzam
Publication venue
Publication date: 01/01/2007
Field of study

Modern Arabic text is written without diacritical marks (short vowels), which causes considerable ambiguity at the word level in the absence of context. Exceptional from this is the Holy Quran, which is endorsed with short vowels and other marks to preserve the pronunciation and hence, the correctness of sensing its words. Searching for a word in vowelized text requires typing and matching all its diacritical marks, which is cumbersome and preventing learners from searching and understanding the text. The other way around, is to ignore these marks and fall in the problem of ambiguity. In this paper, we provide a novel diacritic-less searching approach to retrieve from the Quran relevant verses that match a user’s query through automatic query expansion techniques. The proposed approach utilizes a relational database search engine that is scalable, portable across RDBMS platforms, and provides fast and sophisticated retrieval. The results are presented and the applied approach reveals future directions for search engines

Lancaster E-Prints

Pre Processing Techniques for Arabic Documents Clustering

Author: Alhanjouri Mohammed A.
Publication venue: 'Vandana Publications'
Publication date: 01/01/2017
Field of study

Clustering of text documents is an important technique for documents retrieval. It aims to organize documents into meaningful groups or clusters. Preprocessing text plays a main role in enhancing clustering process of Arabic documents. This research examines and compares text preprocessing techniques in Arabic document clustering. It also studies effectiveness of text preprocessing techniques: term pruning, term weighting using (TF-IDF), morphological analysis techniques using (root-based stemming, light stemming, and raw text), and normalization. Experimental work examined the effect of clustering algorithms using a most widely used partitional algorithm, K-means, compared with other clustering partitional algorithm, Expectation Maximization (EM) algorithm. Comparison between the effect of both Euclidean Distance and Manhattan similarity measurement function was attempted in order to produce best results in document clustering. Results were investigated by measuring evaluation of clustered documents in many cases of preprocessing techniques. Experimental results show that evaluation of document clustering can be enhanced by implementing term weighting (TF-IDF) and term pruning with small value for minimum term frequency. In morphological analysis, light stemming, is found more appropriate than root-based stemming and raw text. Normalization, also improved clustering process of Arabic documents, and evaluation is enhanced

Institutional Repository of the Islamic University of Gaza

Arabic morphological tools for text mining

Author: Ashour Wesam M.
Saad Motaz K
Publication venue
Publication date: 01/01/2010
Field of study

Arabic Language has complex morphology; this led to unavailability to standard Arabic morphological analysis tools until now. In this paper, we present and evaluate existing common Arabic stemming/light stemming algorithms, we also implement and integrate Arabic morphological analysis tools into the leading open source machine learning and data mining tools, Weka and RapidMiner

Institutional Repository of the Islamic University of Gaza

An Intelligent System For Arabic Text Categorization

Author: Fayed Z.T.
Habib Mena Badieh
Syiam M.M.
Publication venue
Publication date: 01/01/2006
Field of study

University of Twente Research Information