9,999 research outputs found
Hybrid Search: Effectively Combining Keywords and Semantic Searches
This paper describes hybrid search, a search method supporting both document and knowledge retrieval via the flexible combination of ontologybased search and keyword-based matching. Hybrid search smoothly copes with
lack of semantic coverage of document content, which is one of the main limitations of current semantic search methods. In this paper we define hybrid search formally, discuss its compatibility with the current semantic trends and present a reference implementation: K-Search. We then show how the method outperforms both keyword-based search and pure semantic search in terms of precision and recall in a set of experiments performed on a collection of about 18.000 technical documents. Experiments carried out with professional users show that users understand the paradigm and consider it very powerful and reliable. K-Search has been ported to two applications released at Rolls-Royce
plc for searching technical documentation about jet engines
Recommended from our members
The quest for information retrieval on the semantic web
Semantic search has been one of the motivations of the Semantic Web since it was envisioned. We propose a model for the exploitation of ontology-based KBs to improve search over large document repositories. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm, and a ranking algorithm. Semantic search is combined with keyword-based search to achieve tolerance to KB incompleteness. Our proposal has been tested on corpora of significant size, showing promising results with respect to keyword-based search, and providing ground for further analysis and research
Recommended from our members
Using TREC for cross-comparison between classic IR and ontology-based search models at a Web scale
The construction of standard datasets and benchmarks to evaluate ontology-based search approaches and to compare then against baseline IR models is a major open problem in the semantic technologies community. In this paper we propose a novel evaluation benchmark for ontology-based IR models based on an adaptation of the well-known Cranfield paradigm (Cleverdon, 1967) traditionally used by the IR community. The proposed benchmark comprises: 1) a text document collection, 2) a set of queries and their corresponding document relevance judgments and 3) a set of ontologies and Knowledge Bases covering the query topics. The document collection and the set of queries and judgments are taken from one of the most widely used datasets in the IR community, the TREC Web track. As a use case example we apply the proposed benchmark to compare a real ontology-based search model (Fernandez, et al., 2008) against the best IR systems of TREC 9 and TREC 2001 competitions. A deep analysis of the strengths and weaknesses of this benchmark and a discussion of how it can be used to evaluate other ontology-based search systems is also included at the end of the paper
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
Word-Entity Duet Representations for Document Ranking
This paper presents a word-entity duet framework for utilizing knowledge
bases in ad-hoc retrieval. In this work, the query and documents are modeled by
word-based representations and entity-based representations. Ranking features
are generated by the interactions between the two representations,
incorporating information from the word space, the entity space, and the
cross-space connections through the knowledge graph. To handle the
uncertainties from the automatically constructed entity representations, an
attention-based ranking model AttR-Duet is developed. With back-propagation
from ranking labels, the model learns simultaneously how to demote noisy
entities and how to rank documents with the word-entity duet. Evaluation
results on TREC Web Track ad-hoc task demonstrate that all of the four-way
interactions in the duet are useful, the attention mechanism successfully
steers the model away from noisy entities, and together they significantly
outperform both word-based and entity-based learning to rank systems
Giving order to image queries
Users of image retrieval systems often find it frustrating that the image they are looking for is not ranked near the top of the results they are presented. This paper presents a computational approach for ranking keyworded images in order of relevance to a given keyword. Our approach uses machine learning to attempt to learn what visual features within an image are most related to the keywords, and then provide ranking based on similarity to a visual aggregate. To evaluate the technique, a Web 2.0 application has been developed to obtain a corpus of user-generated ranking information for a given image collection that can be used to evaluate the performance of the ranking algorithm
Cross-concordances: terminology mapping and its effectiveness for information retrieval
The German Federal Ministry for Education and Research funded a major
terminology mapping initiative, which found its conclusion in 2007. The task of
this terminology mapping initiative was to organize, create and manage
'cross-concordances' between controlled vocabularies (thesauri, classification
systems, subject heading lists) centred around the social sciences but quickly
extending to other subject areas. 64 crosswalks with more than 500,000
relations were established. In the final phase of the project, a major
evaluation effort to test and measure the effectiveness of the vocabulary
mappings in an information system environment was conducted. The paper reports
on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200
- …