25,561 research outputs found

    Development of Arabic Information Retrieval Systems in the 21st Century

    Get PDF
    The present study deals with the development of Arabic Information Retrieval Systems starting from 2000, its vital role in the Text Retrieval Conference (TREC), and in the cross-language information retrieval track. It has overviewed the developments concerning the Holy Qur'an, Arabic language, terms relevant to Arabic information retrieval systems, and the characteristics of the Arabic language compared with other languages since the early 21st century. These developments include rich resources of up to date information so as to develop research in this area, modern developments in assessing and measuring Arabic information retrieval systems, relevant theses, and some research studies of contemporary universities on the use of TREC in Arabic information retrieval, and the researchers with no prior knowledge of Arabic language. The study ends with some studies of the Arab universities. Keywords: Retrieval Systems, Arabic Information, Twenty- first centur

    Morphological variation of Arabic queries

    Get PDF
    Although it has been shown that in test collection based studies, stemming improves retrieval effectiveness in an information retrieval system, morphological variations of queries searching on the same topic are less well understood. This work examines the broad morphological variation that searchers of an Arabic retrieval system put into their queries. In this study, 15 native Arabic speakers were asked to generate queries, morphological variants of query words were collated across users. Queries composed of either the commonest or rarest variants of each word were submitted to a retrieval system and the effectiveness of the searches was measured. It was found that queries composed of the more popular morphological variants were more likely to retrieve relevant documents that those composed of less popular

    Arabic stemmers and their effectiveness on the information retrieval system

    Full text link
    Arabic is a semitic language that has a complex morphology. Therefore, using a stemmer algorithm in an information retrieval system is almost always beneficial; An Arabic stemmer has been implemented and included in the information retrieval system developed at the Information Science Research Institute at the University of Nevada Las Vegas. The Arabic stemmer is written in the Ruby Language and removes affixes then matches the remaining word against patterns of the same length. The retrieval experiment uses the TREC collection which consists of over a million documents. We will test the effectiveness of the Arabic stemmer using recall/precision measurement and compare the result to other stemmers

    Analyst-Focused Arabic Information Retrieval

    Get PDF
    An English-Arabic Cross-Language Information Retrieval Environment was created in which the analyst can query an Arabic database in English and retrieve a set of relevant Arabic documents. The retrieved Arabic documents are automatically translated into English to facilitate readability by the English-only analyst. Proper names of people, places, and organizations are extracted from the retrieved documents and transliterated from Arabic into English. They are presented to the analyst and serve to provide a brief summarization of the retrieved document search query in English. Cross-Language Information Retrieval (CLIR), itself a desideratum in the ARDA workshop, is a special case of Information Retrieval where retrieval is not restricted to the language of the query but queries in one language retrieve documents in other language(s) (Oard and Diekema, 1998). The Arabic that is used in the system is called Modern Standard Arabic (MSA). MSA is the formal Arabic that is used throughout the Arab world in news and broadcast media, and the lingua franca of the Arab. MSA has an estimated 200 million speakers living in Iraq, the Arabian Peninsula, the Levant, Egypt, and Northern Africa

    Search Queries in an Information Retrieval System for Arabic-Language Texts

    Get PDF
    Information retrieval aims to extract from a large collection of data a subset of information that is relevant to user’s needs. In this study, we are interested in information retrieval in Arabic-Language text documents. We focus on the Arabic language, its morphological features that potentially impact the implementation and performance of an information retrieval system and its unique characters that are absent in the Latin alphabet and require specialized approaches. Specifically, we report on the design, implementation and evaluation of the search functionality using the Vector Space Model with several weighting schemes. Our implementation uses the ISRI stemming algorithms as the underlying stemming technique and the general Arabic stop word list for building inverted indices for Arabic-language documents. We evaluate our implementation on a corpus consisting of selected technical papers published in Arabic-language journals. We use the Open Journal Systems (OJS) from the Public Knowledge Project as a repository for the corpus used in the evaluation. We evaluate the performance of our implementation of the search using a classic recall/precision approach and compare it to one of the default multilingual search functions supported in the OJS. Our experimental analysis suggests that stemming is an effective technique for searches in Arabic-language texts that improves the quality of the information retrieval system
    • …