86,155 research outputs found

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

    DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval

    Get PDF
    This paper describes the first participation of DCU in the TREC Medical Records Track (TRECMed). We performed some initial experiments on the 2011 TRECMed data based on the BM25 retrieval model. Surprisingly, we found that the standard BM25 model with default parameters, performs comparable to the best automatic runs submitted to TRECMed 2011 and would have resulted in rank four out of 29 participating groups. We expected that some form of domain adaptation would increase performance. However, results on the 2011 data proved otherwise: concept-based query expansion decreased performance, and filtering and reranking by term proximity also decreased performance slightly. We submitted four runs based on the BM25 retrieval model to TRECMed 2012 using standard BM25, standard query expansion, result filtering, and concept-based query expansion. Official results for 2012 confirm that domain-specific knowledge does not increase performance compared to the BM25 baseline as applied by us

    Asymmetric Feature Maps with Application to Sketch Based Retrieval

    Full text link
    We propose a novel concept of asymmetric feature maps (AFM), which allows to evaluate multiple kernels between a query and database entries without increasing the memory requirements. To demonstrate the advantages of the AFM method, we derive a short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval. Unlike most of the short-code based retrieval systems, the proposed method provides the query localization in the retrieved image. The efficiency of the search is boosted by approximating a 2D translation search via trigonometric polynomial of scores by 1D projections. The projections are a special case of AFM. An order of magnitude speed-up is achieved compared to traditional trigonometric polynomials. The results are boosted by an image-based average query expansion, exceeding significantly the state of the art on standard benchmarks.Comment: CVPR 201

    Using Dempster-Shafer’s evidence theory for query expansion based on freebase knowledge

    Get PDF
    Query expansion is generally a useful technique in improving search performance. However, some expanded query terms obtained by traditional statistical methods (e.g., pseudo-relevance feedback) may not be relevant to the user's information need, while some relevant terms may not be contained in the feedback documents at all. Recent studies utilize external resources to detect terms that are related to the query, and then adopt these terms in query expansion. In this paper, we present a study in the use of Freebase, which is an open source general-purpose ontology, as a source for deriving expansion terms. FreeBase provides a graph-based model of human knowledge, from which a rich and multi-step structure of instances related to the query concept can be extracted, as a complement to the traditional statistical approaches to query expansion. We propose a novel method, based on the well-principled Dempster-Shafer's (D-S) evidence theory, to measure the certainty of expansion terms from the Freebase structure. The expanded query model is then combined with a state of the art statistical query expansion model - the Relevance Model (RM3). Experiments show that the proposed method achieves significant improvements over RM3

    GRAPHENE: A Precise Biomedical Literature Retrieval Engine with Graph Augmented Deep Learning and External Knowledge Empowerment

    Full text link
    Effective biomedical literature retrieval (BLR) plays a central role in precision medicine informatics. In this paper, we propose GRAPHENE, which is a deep learning based framework for precise BLR. GRAPHENE consists of three main different modules 1) graph-augmented document representation learning; 2) query expansion and representation learning and 3) learning to rank biomedical articles. The graph-augmented document representation learning module constructs a document-concept graph containing biomedical concept nodes and document nodes so that global biomedical related concept from external knowledge source can be captured, which is further connected to a BiLSTM so both local and global topics can be explored. Query expansion and representation learning module expands the query with abbreviations and different names, and then builds a CNN-based model to convolve the expanded query and obtain a vector representation for each query. Learning to rank minimizes a ranking loss between biomedical articles with the query to learn the retrieval function. Experimental results on applying our system to TREC Precision Medicine track data are provided to demonstrate its effectiveness.Comment: CIKM 201

    Analisis dan Implementasi Concept Based Query Expansion pada Information Retrieval dengan Menggunakan Association Rules

    Get PDF
    ABSTRAKSI: Proses pencarian pada umumnya menggunakan query yang singkat dan tidak ada penjelasan rinci mengenai query. Selain itu terdapat kemungkinan query yang diinputkan memiliki keambiguitas, sehingga menyebabkan hasil pencarian menjadi tidak sesuai dengan yang diharapkan. Maka diperlukan teknik untuk meningkatkan proses pencarian, salah satunya dengan query expansion.Pada tugas akhir ini diterapkan proses concept based query expansion dengan melakukan analisis terhadap query log untuk mencari hubungan antar term melalui proses association rules. Seluruh data inputan user terhadap mesin pencarian disimpan dan diolah kembali dengan menganalogikan terhadap market based analysis dengan user session sebagai id dan query inputan sebagai item set. Selanjutnya dengan algoritma apriori dihitung association rules yang terkait dan dibentuk graf yang merepresentasikan hubungan antar term. Setelah itu dengan melihat konsep yang terbentuk system menawarkan kepada user untuk melakukan expansi dengan term berupa konsep yang terbentuk tersebut.Dari hasil pengujian yang dilakukan, terdapat peningkatan performansi pencarian dengan meningkatnya nilai precision atau recall. Peningkatan precision terjadi pada expansion dengan pemilihan konsep spesifikasi dan peningkatan recall terjadi pada expansion dengan pemilihan konsep sinonim. Maka performansi dari query expansion tidak lepas dari tujuan user dan kontribusi user dalam pencarian.Kata Kunci : concept based query expansion, query log, association rulesABSTRACT: Search process in general is using a short and undetailed query, therefore there is a possibility that ambiguous query is entered to the searching system, causing poor result in the end. The solution to this problem is to reformulate the query, such as query expansion.In this final project, i use one of the data mining concept, the association rules as the method to get the concept of the query expansion process. First of all, the data retrieved from the past activity is collected and processed so we can identified the pattern as market based analysis model. Next, using apriori algorithm the data is now represented as rules representing how good is the relation between two terms. Then, after mapping the rules to graph, the concept is now can be collected and we can continue processing the expansion process using that concept.As the results, there is an enchancement to the searching prosess seen from the precision and recall increase. Precision increase happened at the spesification concept of the expansion while recall incease happened at the synonim concept. Therefore the query expansion performance cannot be separated with the goal of the searching process itself.Keyword: concept based query expansion, query log, association rule

    Proof of Concept of Ontology-based Query Expansion on Financial Domain

    Get PDF
    Este trabajo presenta el uso de una ontología en el dominio financiero para la expansión de consultas con el fin de mejorar los resultados de un sistema de recuperación de información (RI) financiera. Este sistema está compuesto por una ontología y un índice de Lucene que permite recuperación de conceptos identificados mediante procesamiento de lenguaje natural. Se ha llevado a cabo una evaluación con un conjunto limitado de consultas y los resultados indican que la ambigüedad sigue siendo un problema al expandir la consulta. En ocasiones, la elección de las entidades adecuadas a la hora de expandir las consultas (filtrando por sector, empresa, etc.) permite resolver esa ambigüedad.This paper explains the application of ontologies in financial domains to a query expansion process. The final goal is to improve financial information retrieval effectiveness. The system is composed of an ontology and a Lucene index that stores and retrieves natural language concepts. An initial evaluation with a limited number of queries has been performed. Obtained results show that ambiguity remains a problem when expanding a query. The filtering of entities in the expansion process by selecting only companies or references to markets helps in the reduction of ambiguity.Este trabajo ha sido parcialmente financiado por el proyecto Trendminer (EU FP7-ICT287863) , el proyecto Monnet (EU FP7-ICT 247176) y MA2VICMR (S2009/TIC-1542).Publicad

    Probabilistic hyperspace analogue to language

    Get PDF
    Song and Bruza introduce a framework for Information Retrieval(IR) based on Gardenfor's three tiered cognitive model; Conceptual Spaces. They instantiate a conceptual space using Hyperspace Analogue to Language (HAL to generate higher order concepts which are later used for ad-hoc retrieval. In this poster, we propose an alternative implementation of the conceptual space by using a probabilistic HAL space (pHAL). To evaluate whether converting to such an implementation is beneficial we have performed an initial investigation comparing the concept combination of HAL against pHAL for the task of query expansion. Our experiments indicate that pHAL outperforms the original HAL method and that better query term selection methods can improve performance on both HAL and pHAL
    corecore