Search CORE

2,540 research outputs found

Improving Entity Retrieval on Structured Data

Author: Dietze Stefan
Fetahu Besnik
Gadiraju Ujwal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/03/2017
Field of study

The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches

arXiv.org e-Print Archive

CiteSeerX

Ranking Archived Documents for Structured Queries on Semantic Layers

Author: Arikan Irem
Balog Krisztian
Fafalios P.
Feyznia Azam
Halpin Harry
Latifi Sara
Mulay Kunal
Ngonga Ngomo Axel-Cyrille
Tran Nam Khanh
Publication venue
Publication date: 23/10/2018
Field of study

Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of "ranking archived documents for structured queries on semantic layers". Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitation

arXiv.org e-Print Archive

Crossref

Towards Business Intelligence over Unified Structured and Unstructured Data Using XML

Author: Vishu Krishnamurthy
Zhen Hua Liu
Publication venue: 'IntechOpen'
Publication date: 01/02/2012
Field of study

IntechOpen

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Author: Andoni A.
Beyer K.
Broder A. Z.
Brown P. F.
Fried D.
Le Q.
Mikolov T.
Mu Y.
Muja M.
Petrović S.
Riezler S.
Salton G.
Wang J.
Weber R.
Yang L.
Yao X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2016
Field of study

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

arXiv.org e-Print Archive

Crossref

Scipedia

Contextualized ranking of entity types based on knowledge graphs

Author: Alberto Tonon
Balog
Balog
Balog
Bizer
Bollacker
Ciaramita
Cunningham
Demartini
Demartini
Finkel
Finkel
Geurts
Gianluca Demartini
Holmes
Järvelin
Kalyanpur
Karl Aberer
Kumar
Liu
Mendes
Michele Catasta
Nadeau
Nakashole
Philippe Cudré-Mauroux
Pound
Roman Prokofyev
Suchanek
Suchanek
Tonon
Tummarello
Tylenda
Vallet
Whang
Yao
Zaragoza
Publication venue: 'Elsevier BV'
Publication date: 01/04/2015
Field of study

© 2016 Elsevier B.V. A large fraction of online queries targets entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge graph but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All these types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the knowledge graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs) and different type hierarchies (including DBpedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user

Infoscience - École polytechnique fédérale de Lausanne

Crossref

RERO DOC Digital Library

White Rose Research Online

University of Queensland eSpace

VoldemortKG: Mapping schema.org and Web Entities to Linked Open Data

Author: E Oren
G Tummarello
L Otero-Cerdeira
M Bron
R Meusel
Publication venue
Publication date: 09/12/2016
Field of study

Crossref

RERO DOC Digital Library

Approximating expressive queries on graph-modeled data: The GeX approach

Author: Mandreoli Federica
Martoglia Riccardo
Penzo Wilma
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

We present the GeX (Graph-eXplorer) approach for the approximate matching of complex queries on graph-modeled data. GeX generalizes existing approaches and provides for a highly expressive graph-based query language that supports queries ranging from keyword-based to structured ones. The GeX query answering model gracefully blends label approximation with structural relaxation, under the primary objective of delivering meaningfully approximated results only. GeX implements ad-hoc data structures that are exploited by a top-k retrieval algorithm which enhances the approximate matching of complex queries. An extensive experimental evaluation on real world datasets demonstrates the efficiency of the GeX query answering

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Fast Nearest Neighbor Search with Keywords Using IR2-Tree

Author: Mr. Pramod Khandare, Dr. Nilesh Uke
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2016
Field of study

Conventional abstraction queries, like vary search and nearest neighbor retrieval, involve alone conditions on objects geometric properties. Today, many trendy applications concern novel varieties of queries that aim to go looking out objects satisfying every a abstraction predicate, and a predicate on their associated texts. As associate example, instead of considering all the restaurants, a nearest neighbor question would instead provoke the eating place that is the nearest among those whose menus contain asteak, ˆ spaghetti, brandyaˆ all at identical time. Currently, the best answer to such queries depends on the IR2-tree, which, as shown throughout this paper, contains many deficiencies that seriously impact its efficiency. motivated by this, It tend to develop a latest access methodology called the abstraction inverted index that extends the traditional inverted index to subsume f-dimensional info, and comes with algorithms that will answer nearest neighbor queries with keywords in real time. As verified by experiments, the projected techniques trounce the IR2-tree in question latent amount considerably, generally by a part of orders of magnitude

International Journal on Recent and Innovation Trends in Computing and Communication

Toward Entity-Aware Search

Author: Cheng Tao
Publication venue
Publication date: 01/12/2010
Field of study

As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

Illinois Digital Environment for Access to Learning and Scholarship Repository