339 research outputs found
Reformulation of queries using similarity thesauri
Este artículo trata sobre la recuparación de la información en thesaurus similares.One of the major problems in information retrieval is the formulation of queries on thepart of the user. This entails specifying a set of words or terms that express their informationalneed. However, it is well-known that two people can assign different terms to refer tothe same concepts. The techniques that attempt to reduce this problem as much as possiblegenerally start from a first search, and then study how the initial query can be modified toobtain better results. In general, the construction of the new query involves expanding theterms of the initial query and recalculating the importance of each term in the expandedquery. Depending on the technique used to formulate the new query several strategies aredistinguished. These strategies are based on the idea that if two terms are similar (withrespect to any criterion), the documents in which both terms appear frequently will also berelated. The technique we used in this study is known as query expansion using similaritythesauri
A Systematic Review of Automated Query Reformulations in Source Code Search
Fixing software bugs and adding new features are two of the major maintenance
tasks. Software bugs and features are reported as change requests. Developers
consult these requests and often choose a few keywords from them as an ad hoc
query. Then they execute the query with a search engine to find the exact
locations within software code that need to be changed. Unfortunately, even
experienced developers often fail to choose appropriate queries, which leads to
costly trials and errors during a code search. Over the years, many studies
attempt to reformulate the ad hoc queries from developers to support them. In
this systematic literature review, we carefully select 70 primary studies on
query reformulations from 2,970 candidate studies, perform an in-depth
qualitative analysis (e.g., Grounded Theory), and then answer seven research
questions with major findings. First, to date, eight major methodologies (e.g.,
term weighting, term co-occurrence analysis, thesaurus lookup) have been
adopted to reformulate queries. Second, the existing studies suffer from
several major limitations (e.g., lack of generalizability, vocabulary mismatch
problem, subjective bias) that might prevent their wide adoption. Finally, we
discuss the best practices and future opportunities to advance the state of
research in search query reformulations.Comment: 81 pages, accepted at TOSE
Cross-concordances: terminology mapping and its effectiveness for information retrieval
The German Federal Ministry for Education and Research funded a major
terminology mapping initiative, which found its conclusion in 2007. The task of
this terminology mapping initiative was to organize, create and manage
'cross-concordances' between controlled vocabularies (thesauri, classification
systems, subject heading lists) centred around the social sciences but quickly
extending to other subject areas. 64 crosswalks with more than 500,000
relations were established. In the final phase of the project, a major
evaluation effort to test and measure the effectiveness of the vocabulary
mappings in an information system environment was conducted. The paper reports
on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200
Improving Retrieval Results with discipline-specific Query Expansion
Choosing the right terms to describe an information need is becoming more
difficult as the amount of available information increases.
Search-Term-Recommendation (STR) systems can help to overcome these problems.
This paper evaluates the benefits that may be gained from the use of STRs in
Query Expansion (QE). We create 17 STRs, 16 based on specific disciplines and
one giving general recommendations, and compare the retrieval performance of
these STRs. The main findings are: (1) QE with specific STRs leads to
significantly better results than QE with a general STR, (2) QE with specific
STRs selected by a heuristic mechanism of topic classification leads to better
results than the general STR, however (3) selecting the best matching specific
STR in an automatic way is a major challenge of this process.Comment: 6 pages; to be published in Proceedings of Theory and Practice of
Digital Libraries 2012 (TPDL 2012
Consolidated study on query expansion
A typical day of million web users all over the world starts with a simple query. The quest for information on a particular topic drives them to search for it, and in the pursuit of their info the terms they supply for queries varies from person to person depending on the knowledge they have. With a vast collection of documents available on the web universe it is the onus of the retrieval system to return only those documents that are relevant and satisfy the user’s search requirements. The document mismatch problem is resolved by appending extra query terms to the original query which improves the retrieval performance. The addition of terms tends to minimize the bridging-gap between the documents and queries.
In this thesis, a brief study is done on the reformulation of queries, along with methods of calculating the relevancy of candidate terms for query expansion by using several ranking algorithms, term weighting algorithms and feedback processes involving evaluations. Comparisons of various methods based on their efficiencies are also discussed. On the whole a consolidated report of query expansion in general is given
- …