Search CORE

4 research outputs found

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

Author: Chakrabarti Kaushik
Chaudhuri Surajit
Ding Bolin
Yang Mohan
Publication venue
Publication date: 03/09/2014
Field of study

We aim to provide table answers to keyword queries against knowledge bases. For queries referring to multiple entities, like "Washington cities population" and "Mel Gibson movies", it is better to represent each relevant answer as a table which aggregates a set of entities or entity-joins within the same table scheme or pattern. In this paper, we study how to find highly relevant patterns in a knowledge base for user-given keyword queries to compose table answers. A knowledge base can be modeled as a directed graph called knowledge graph, where nodes represent entities in the knowledge base and edges represent the relationships among them. Each node/edge is labeled with type and text. A pattern is an aggregation of subtrees which contain all keywords in the texts and have the same structure and types on node/edges. We propose efficient algorithms to find patterns that are relevant to the query for a class of scoring functions. We show the hardness of the problem in theory, and propose path-based indexes that are affordable in memory. Two query-processing algorithms are proposed: one is fast in practice for small queries (with small patterns as answers) by utilizing the indexes; and the other one is better in theory, with running time linear in the sizes of indexes and answers, which can handle large queries better. We also conduct extensive experimental study to compare our approaches with a naive adaption of known techniques.Comment: VLDB 201

arXiv.org e-Print Archive

CiteSeerX

Dynamic topic herarchies and segmented rankings in textual OLAP technology.

Author: Souza Adriano Neves de Paula e
Publication venue
Publication date: 01/01/2017
Field of study

Programa de P?s-Gradua??o em Ci?ncia da Computa??o. Departamento de Ci?ncia da Computa??o, Instituto de Ci?ncias Exatas e Biol?gicas, Universidade Federal de Ouro Preto.A tecnologia OLAP tem se consolidado h? 20 anos e recentemente foi redesenhada para que suas dimens?es, hierarquias e medidas possam suportar as particularidades dos dados textuais. A tarefa de organizar dados textuais de forma hier?rquica pode ser resolvida com a constru??o de hierarquias de t?picos. Atualmente, a hierarquia de t?picos ? definida apenas uma vez no cubo de dados, ou seja, para todo o \textit{lattice} de cuboides. No entanto, tal hierarquia ? sens?vel ao conte?do da cole??o de documentos, portanto em um mesmo cubo de dados podem existir c?lulas com conte?dos completamente diferentes, agregando cole??es de documentos distintas, provocando potenciais altera??es na hierarquia de t?picos. Al?m disso, o segmento de texto utilizado na an?lise OLAP tamb?m influencia diretamente nos t?picos elencados por tal hierarquia. Neste trabalho, apresentamos um cubo de dados textual com m?ltiplas e din?micas hierarquias de t?picos. M?ltiplas por serem constru?das a partir de diferentes segmentos de texto e din?micas por serem constru?das para cada c?lula do cubo. Outra contribui??o deste trabalho refere-se ? resposta das consultas multidimensionais. O estado da arte normalmente retorna os top-k documentos mais relevantes para um determinado t?pico. Vamos al?m disso, retornando outros segmentos de texto, como os t?tulos mais significativos, resumos e par?grafos. A abordagem ? projetada em quatro etapas adicionais, onde cada passo atenua um pouco mais o impacto da constru??o de v?rias hierarquias de t?picos e rankings de segmentos por c?lula de cubo. Experimentos que utilizam parte dos documentos da DBLP como uma cole??o de documentos refor?am nossas hip?teses.The OLAP technology emerged 20 years ago and recently has been redesigned so that its dimensions, hierarchies and measures can support the particularities of textual data. Organizing textual data hierarchically can be solved with topic hierarchies. Currently, the topic hierarchy is de ned only once in the data cube, e.g., forthe entire lattice of cubo ids. However, such hierarchy is sensitive to the document collection content. Thus, a data cube cell can contain a collection of documents distinct fromothers in the same cube, causing potential changes in the topic hierarchy. Further more, the text segment used in OLAP analysis also changes this hierarchy. In this work, we present a textual data cube with multiple dynamic topic hierarchies for each cube cell. Multiple hierarchies, since the presented approach builds a topic hierarchy per text segment. Another contribution of this work refers to query response. The state-of-the-art normally returns the top-k documents to the topic selected in the query. We go beyond by returning other text segments, such as the most signi cant titles, abstracts and paragraphs. The approach is designed in four complementary steps and each step attenuates a bit more the impact of building multiple to pic hierarchies and segmented rankings per cube cell. Experiments using part of the DBLP papers as a document collection reinforce our hypotheses

REPOSITORIO INSTITUCIONAL DA UFOP