2,675 research outputs found

    Terms interrelationship query expansion to improve accuracy of Quran search

    Get PDF
    Quran retrieval system is becoming an instrument for users to search for needed information. The search engine is one of the most popular search engines that successfully implemented for searching relevant verses queries. However, a major challenge to the Quran search engine is word ambiguities, specifically lexical ambiguities. With the advent of query expansion techniques for Quran retrieval systems, the performance of the Quran retrieval system has problem and issue in terms of retrieving users needed information. The results of the current semantic techniques still lack precision values without considering several semantic dictionaries. Therefore, this study proposes a stemmed terms interrelationship query expansion approach to improve Quran search results. More specifically, related terms were collected from different semantic dictionaries and then utilize to get roots of words using a stemming algorithm. To assess the performance of the stemmed terms interrelationship query expansion, experiments were conducted using eight Quran datasets from the Tanzil website. Overall, the results indicate that the stemmed terms interrelationship query expansion is superior to unstemmed terms interrelationship query expansion in Mean Average Precision with Yusuf Ali 68%, Sarawar 67%, Arberry 72%, Malay 65%, Hausa 62%, Urdu 62%, Modern Arabic 60% and Classical Arabic 59%

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    Glasgow University at TRECVID 2006

    Get PDF
    In the first part of this paper we describe our experiments in the automatic and interactive search tasks of TRECVID 2006. We submitted five fully automatic runs, including a text baseline, two runs based on visual features, and two runs that combine textual and visual features in a graph model. For the interactive search, we have implemented a new video search interface with relevance feedback facilities, based on both textual and visual features. The second part is concerned with our approach to the high-level feature extraction task, based on textual information extracted from speech recogniser and machine translation outputs. They were aligned with shots and associated with high-level feature references. A list of significant words was created for each feature, and it was in turn utilised for identification of a feature during the evaluation

    Arabic Information Retrieval: A Relevancy Assessment Survey

    Get PDF
    The paper presents a research in Arabic Information Retrieval (IR). It surveys the impact of statistical and morphological analysis of Arabic text in improving Arabic IR relevancy. We investigated the contributions of Stemming, Indexing, Query Expansion, Text Summarization (TS), Text Translation, and Named Entity Recognition (NER) in enhancing the relevancy of Arabic IR. Our survey emphasizing on the quantitative relevancy measurements provided in the surveyed publications. The paper shows that the researchers achieved significant enhancements especially in building accurate stemmers, with accuracy reaches 97%, and in measuring the impact of different indexing strategies. Query expansion and Text Translation showed positive relevancy effect. However, other tasks such as NER and TS still need more research to realize their impact on Arabic IR

    Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

    Full text link
    While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short

    Evaluating the implicit feedback models for adaptive video retrieval

    Get PDF
    Interactive video retrieval systems are becoming popular. On the one hand, these systems try to reduce the effect of the semantic gap, an issue currently being addressed by the multimedia retrieval community. On the other hand, such systems enhance the quality of information seeking for the user by supporting query formulation and reformulation. Interactive systems are very popular in the textual retrieval domain. However, they are relatively unexplored in the case of multimedia retrieval. The main problem in the development of interactive retrieval systems is the evaluation cost.The traditional evaluation methodology, as used in the information retrieval domain, is not applicable. An alternative is to use a user-centred evaluation methodology. However, such schemes are expensive in terms of effort, cost and are not scalable. This problem gets exacerbated by the use of implicit indicators, which are useful and increasingly used in predicting user intentions. In this paper, we explore the effectiveness of a number of interfaces and feedback mechanisms and compare their relative performance using a simulated evaluation methodology. The results show the relatively better performance of a search interface with the combination of explicit and implicit features
    • …
    corecore