Search CORE

4 research outputs found

Hamshahri: A standard Persian Text Collection

Author: Aleahmad Abolfazl
Amiri Hadi
Oroumchian Farhad
Rahgozar Masoud
Publication venue: 'Sociological Research Online'
Publication date: 14/08/2008
Field of study

The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is lack of a standard test collection. In this paper we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgment are presented in this paper. We believe that this collection is the largest Persian text collection, so far

CiteSeerX

Research Online

Hamshahri: A standard Persian Text Collection

Author: Aleahmad Abolfazl
Amiri Hadi
Oroumchian Farhad
Rahgozar Masoud
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2009
Field of study

Research Online

Using OWA fuzzy operator to merge retrieval system results

Author: AleAhmad A.
Amiri H.
Lucas C.
Oroumchian Farhad
Rahgozar M.
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2007
Field of study

With rapid growth of information sources, it is essential to develop methods that retrieve most relevant information according to the user requirements. One way of improving the quality of retrieval is to use more than one retrieval engine and then merge the retrieved results and show a single ranked list to the user. There are studies that suggest combining the results of multiple search engines will improve ranking when these engine are treated as independent experts. In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and Lnc.btc weighting schemes. The experiments were conducted on a large Persian collection of news archives called Hamshari Collection. Different variations of the Ordered Weighted Average (OWA) fuzzy operators method, called a quantifier based OWA operator and a degree-of-importance based OWA operator method have been tested for merging the results. Our experimental results show that the OWA operators produce better precision and ranking in comparison with weaker retrieval methods. But in comparison with stronger retrieval models they only produce minimal improvements

Research Online