46 research outputs found
Assessment of query reweighing, by rocchio method in farsi information retrieval
Due to the lack of users knowledge of the collections used by search engines and in general retrieval systems, users can not express their information need appropriately in queries. In other words, they do not have enough experience to formulate their needs to find related documents. The idea of user’s query expansion aims to help users to improve and correct the queries. In fact, retrieval system, regarding the feedback it receives from user at the first stage, moves the query in set space to more related documents. Different approaches in information retrieval systems have been used; however, there has not been any assessment of efficacy of query expansion in Farsi information retrieval systems. In this paper, expansion basic model of Rocchio, assessed as the primary model to retrieve Farsi documents, has been presented. As a matter of fact, the purpose of this study is to determine the effect of a standard and basic model on query expansion to retrieve Farsi documents, so that the researchers can compare their achievements of query expansion with the findings of this paper which showed a straightforward and positive effect on Farsi document retrieval
Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort
In the last decade drug overdose deaths reached staggering proportions in the
US. Besides the raw yearly deaths count that is worrisome per se, an alarming
picture comes from the steep acceleration of such rate that increased by 21%
from 2015 to 2016. While traditional public health surveillance suffers from
its own biases and limitations, digital epidemiology offers a new lens to
extract signals from Web and Social Media that might be complementary to
official statistics. In this paper we present a computational approach to
identify a digital cohort that might provide an updated and complementary view
on the opioid crisis. We introduce an information retrieval algorithm suitable
to identify relevant subspaces of discussion on social media, for mining data
from users showing explicit interest in discussions about opioid consumption in
Reddit. Moreover, despite the pseudonymous nature of the user base, almost 1.5
million users were geolocated at the US state level, resembling the census
population distribution with a good agreement. A measure of prevalence of
interest in opiate consumption has been estimated at the state level, producing
a novel indicator with information that is not entirely encoded in the standard
surveillance. Finally, we further provide a domain specific vocabulary
containing informal lexicon and street nomenclature extracted by user-generated
content that can be used by researchers and practitioners to implement novel
digital public health surveillance methodologies for supporting policy makers
in fighting the opioid epidemic.Comment: Proceedings of the 2019 World Wide Web Conference (WWW '19
Recommended from our members
Semi-Automatic Query Expansion Approach to Web- Based Information Retrieval
The query used for Web searching is usually short and may not be able to reflect the intrinsic semantics of the user information need. The purpose of the paper is to take into account user information feedback, and to develop a semi-automatic query expansion approach to improve the effectiveness of Web searching. A search engine has been developed using the vector information retrieval model to validate the semi-automatic query expansion approach. The experiments show that this approach may improve the effectiveness of web searching
Rocchio\u27s Model Based on Vector Space Basis Change for Pseudo Relevance Feedback
Rocchio\u27s relevance feedback model is a classic query expansion method and it has been shown to be effective in boosting information retrieval performance. The main problem with this method is that the relevant and the irrelevant documents overlap in the vector space because they often share same terms (at least the terms of the query). With respect to the initial vector space basis (index terms), it is difficult to select terms that separate relevant and irrelevant documents. The Vector Space Basis Change is used to separate relevant and irrelevant documents without any modification on the query term weights. In this paper, first, we study how to incorporate Vector Space Basis Change into the Rocchio\u27s model. Second, we propose Rocchio\u27s models based on Vector Space Basis Change, called VSBCRoc models. Experimental results on a TREC collection show that our proposed models are effective
THE APPLICATION OF SEMANTIC INFORMATION CONTAINED IN RELEVANCE FEEDBACK IN THE ENHANCEMENT OF DOCUMENT RE-RANKING
Easily accessed publishing channels have resulted in the problem of information overload. Conventional information retrieval models, such as the vector model or the probability model, apply the lexical information contained in relevance feedback in the enhancement of document re-ranking. Improvement is possible considering the application of semantic information. Studies have been taking the approach of concept extraction and application in the dealing with this semantic matter. So far, a perfect solution remains elusive and research still has new ground to cover. As such, we have proposed and tested a strategic method to form a more understanding of this field of study. The results of formal tests show that the proposed method is more effective than the baseline ranking model
Using tag-neighbors for query expansion in medical information retrieval
In the context of medical document retrieval, users often under-specified queries lead to undesired search results that suffer from not containing the information they seek, inadequate domain knowledge matches and unreliable sources. To overcome the limitations of under-specified queries, we utilize tags to enhance information retrieval capabilities by expanding users' original queries with context-relevant information. We compute a set of significant tag neighbor candidates based on the neighbor frequency and weight, and utilize the most frequent and weighted neighbors to expand an entry query that has terms matching tags. The proposed approach is evaluated using MedWorm medical article collection and standard evaluation methods from the text retrieval conference (TREC). We compared the baseline of 0.353 for Mean Average Precision (MAP), reaching a MAP 0.491 (+39%) with the query expansion. In-depth analysis shows how this strategy is beneficial when compared with different ranks of the retrieval results. © 2011 IEEE