46 research outputs found

    Assessment of query reweighing, by rocchio method in farsi information retrieval

    Get PDF
    Due to the lack of users knowledge of the collections used by search engines and in general retrieval systems, users can not express their information need appropriately in queries. In other words, they do not have enough experience to formulate their needs to find related documents. The idea of user’s query expansion aims to help users to improve and correct the queries. In fact, retrieval system, regarding the feedback it receives from user at the first stage, moves the query in set space to more related documents. Different approaches in information retrieval systems have been used; however, there has not been any assessment of efficacy of query expansion in Farsi information retrieval systems. In this paper, expansion basic model of Rocchio, assessed as the primary model to retrieve Farsi documents, has been presented. As a matter of fact, the purpose of this study is to determine the effect of a standard and basic model on query expansion to retrieve Farsi documents, so that the researchers can compare their achievements of query expansion with the findings of this paper which showed a straightforward and positive effect on Farsi document retrieval

    Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort

    Get PDF
    In the last decade drug overdose deaths reached staggering proportions in the US. Besides the raw yearly deaths count that is worrisome per se, an alarming picture comes from the steep acceleration of such rate that increased by 21% from 2015 to 2016. While traditional public health surveillance suffers from its own biases and limitations, digital epidemiology offers a new lens to extract signals from Web and Social Media that might be complementary to official statistics. In this paper we present a computational approach to identify a digital cohort that might provide an updated and complementary view on the opioid crisis. We introduce an information retrieval algorithm suitable to identify relevant subspaces of discussion on social media, for mining data from users showing explicit interest in discussions about opioid consumption in Reddit. Moreover, despite the pseudonymous nature of the user base, almost 1.5 million users were geolocated at the US state level, resembling the census population distribution with a good agreement. A measure of prevalence of interest in opiate consumption has been estimated at the state level, producing a novel indicator with information that is not entirely encoded in the standard surveillance. Finally, we further provide a domain specific vocabulary containing informal lexicon and street nomenclature extracted by user-generated content that can be used by researchers and practitioners to implement novel digital public health surveillance methodologies for supporting policy makers in fighting the opioid epidemic.Comment: Proceedings of the 2019 World Wide Web Conference (WWW '19

    Rocchio\u27s Model Based on Vector Space Basis Change for Pseudo Relevance Feedback

    Get PDF
    Rocchio\u27s relevance feedback model is a classic query expansion method and it has been shown to be effective in boosting information retrieval performance. The main problem with this method is that the relevant and the irrelevant documents overlap in the vector space because they often share same terms (at least the terms of the query). With respect to the initial vector space basis (index terms), it is difficult to select terms that separate relevant and irrelevant documents. The Vector Space Basis Change is used to separate relevant and irrelevant documents without any modification on the query term weights. In this paper, first, we study how to incorporate Vector Space Basis Change into the Rocchio\u27s model. Second, we propose Rocchio\u27s models based on Vector Space Basis Change, called VSBCRoc models. Experimental results on a TREC collection show that our proposed models are effective

    THE APPLICATION OF SEMANTIC INFORMATION CONTAINED IN RELEVANCE FEEDBACK IN THE ENHANCEMENT OF DOCUMENT RE-RANKING

    Get PDF
    Easily accessed publishing channels have resulted in the problem of information overload. Conventional information retrieval models, such as the vector model or the probability model, apply the lexical information contained in relevance feedback in the enhancement of document re-ranking. Improvement is possible considering the application of semantic information. Studies have been taking the approach of concept extraction and application in the dealing with this semantic matter. So far, a perfect solution remains elusive and research still has new ground to cover. As such, we have proposed and tested a strategic method to form a more understanding of this field of study. The results of formal tests show that the proposed method is more effective than the baseline ranking model

    Using tag-neighbors for query expansion in medical information retrieval

    Full text link
    In the context of medical document retrieval, users often under-specified queries lead to undesired search results that suffer from not containing the information they seek, inadequate domain knowledge matches and unreliable sources. To overcome the limitations of under-specified queries, we utilize tags to enhance information retrieval capabilities by expanding users' original queries with context-relevant information. We compute a set of significant tag neighbor candidates based on the neighbor frequency and weight, and utilize the most frequent and weighted neighbors to expand an entry query that has terms matching tags. The proposed approach is evaluated using MedWorm medical article collection and standard evaluation methods from the text retrieval conference (TREC). We compared the baseline of 0.353 for Mean Average Precision (MAP), reaching a MAP 0.491 (+39%) with the query expansion. In-depth analysis shows how this strategy is beneficial when compared with different ranks of the retrieval results. © 2011 IEEE
    corecore