Search CORE

4 research outputs found

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Author: Andoni A.
Beyer K.
Broder A. Z.
Brown P. F.
Fried D.
Le Q.
Mikolov T.
Mu Y.
Muja M.
Petrović S.
Riezler S.
Salton G.
Wang J.
Weber R.
Yang L.
Yao X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2016
Field of study

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

arXiv.org e-Print Archive

Crossref

Scipedia

Instantiation of relations for semantic annotation

Author: Napoli Amedeo
Polanco Xavier
Tenier Sylvain
Toussaint Yannick
Publication venue: IEEE Computer Society Press
Publication date: 18/12/2006
Field of study

http://www.ieee.orgThis paper presents a methodology for the semantic annotation of web pages with individuals of a domain ontology. While most semantic annotation systems can recognize knowledge units, they usually do not establish explicit relations between them. The method presented identifies the individuals which should be related among the whole set of individuals and codes them as role instances within an OWL ontology. This is done by using a correspondence between the tree structure of a web page and the semantics of the information it contains

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Yago: a large ontology from Wikipedia and WordNet

Author: Kasneci G.
Suchanek F.
Weikum G.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2007
Field of study

This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO's precision at 95% -- as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO's data

MPG.PuRe

Database-Inspired Search

Author: David Konopnicki
Oded Shmueli
Publication venue
Publication date
Field of study

"W3QL: A Query Language for the WWW", published in 1995, presented a language with several distinctive features. Employing existing indexes as access paths, it allowed the selection of documents using conditions on semi-structured documents and maintaining dynamic views of navigational queries. W3QL was capable of automatically filling out forms and navigating through them. Finally, in the SQL tradition, it was a declarative query language, that could be the subject of optimization. Ten years later

CiteSeerX