Search CORE

53 research outputs found

Generating Preview Tables for Entity Graphs

Author: Agarwal R.
Balmin A.
Brin S.
Cohen J.
Nandi A.
Yu C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/05/2016
Field of study

Users are tapping into massive, heterogeneous entity graphs for many applications. It is challenging to select entity graphs for a particular need, given abundant datasets from many sources and the oftentimes scarce information for them. We propose methods to produce preview tables for compact presentation of important entity types and relationships in entity graphs. The preview tables assist users in attaining a quick and rough preview of the data. They can be shown in a limited display space for a user to browse and explore, before she decides to spend time and resources to fetch and investigate the complete dataset. We formulate several optimization problems that look for previews with the highest scores according to intuitive goodness measures, under various constraints on preview size and distance between preview tables. The optimization problem under distance constraint is NP-hard. We design a dynamic-programming algorithm and an Apriori-style algorithm for finding optimal previews. Results from experiments, comparison with related work and user studies demonstrated the scoring measures' accuracy and the discovery algorithms' efficiency.Comment: This is the camera-ready version of a SIGMOD16 paper. There might be tiny differences in layout, spacing and linebreaking, compared with the version in the SIGMOD16 proceedings, since we must submit TeX files and use arXiv to compile the file

arXiv.org e-Print Archive

Crossref

XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

Author: A. Balmin
A. Chebotko
D. Florescu
D. Kossmann
H. Gupta
H. Gupta
H.V. Jagadish
M. Atay
M.R. Garey
R. Chirkova
S. Abiteboul
S. Chaudhuri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree

T

, in which every node

i

has a size

c_i

and profit

p_i

, and the size limitation

C

. The target is to find a subset of subtrees rooted at nodes

i_1,\cdots, i_k

respectively such that

c_{i_1}+\cdots +c_{i_k}\le C

, and

p_{i_1}+\cdots +p_{i_k}

is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

arXiv.org e-Print Archive

Crossref

Reverse top-k search using random walk with restart

Author: Balmin A.
Benczúr A. A.
Li N.
Ng A. Y.
Tao Y.
Wilkinson J. H.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Hypothetical Reasoning via Provenance Abstraction

Author: Assadi S.
Balmin A.
Brezinski C.
Deutch D.
Deutch D.
Deutch D.
Deutch D.
Garey M. R.
Geerts F.
Glavic B.
Glavic B.
Ikeda R.
Lee S.
Publication venue
Publication date: 10/07/2020
Field of study

Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff between provenance size and supported granularity of the hypothetical reasoning, and study the complexity of the resulting optimization problem, provide efficient algorithms for tractable cases and heuristics for others. We experimentally study the performance of our solution for various queries and abstraction trees. Our study shows that the algorithms generally lead to substantial speedup of hypothetical reasoning, with a reasonable loss of accuracy

arXiv.org e-Print Archive

Crossref

Keyword search on external memory data graphs

Author: Balmin Andrey
Bijay Kumar Gaurav
Buchsbaum A. L.
Buchsbaum A. L.
Graupmann J.
Gupta Nitin
Hristidis V.
Hristidis V.
Kacholia Varun
Raghavan Sriram
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Workload-Aware Views Materialization for Big Open Linked Data

Author: Arion A.
Auer S.
Balmin A.
Castillo R.
Chaudhuri S.
Chaudhuri S.
Chen D.
Dritsou V.
Harinarayan V.
Jiang Y.
Karanasos K.
Kaushik R.
Le W. C.
Liu C.
Lorey J.
Morsey M.
Neumann T
Neumann T.
Raymond J. W.
Roy P.
Schmidt M.
Stocker M.
Suchanek F. M.
Tang N.
Xu X.
Zlamaniec T.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/05/2021
Field of study

Crossref

Coventry University Pure Portal

Information Discovery on Electronic Health Records Using Authority Flow Techniques

Author: A Balmin
A Singhal
A Singhal
AK Sehgal
CJ McDonald
DL Shepelyansky
F Farfán
H Hwang
J Savoy
JF Fontaine
L Guo
M Brinkmeier
MG Weiner
MI Lieberman
Michael Weiner
Paul Biondich
R Moskovitch
R Motwani
R Varadarajan
Ramakrishna R Varadarajan
RM Podowski
S Agrawal
S Brin
SE Robertson
SE Robertson
T Haveliwala
T Matsunaga
V Hristidis
V Hristidis
Vagelis Hristidis
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs. Methods We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease. Results Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians. Conclusions Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.</p

Crossref

IUPUIScholarWorks

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DigitalCommons@Florida International University

Tractable XML data exchange via relations

Author: A. Balmin
D. Florescu
G. Gottlob
G. Gou
H. V. Jagadish
J. Shanmugasundaram
M. Arenas
N. Klarlundi
P. Barceló
R. Fagin
R. Fagin
R. Krishnamurthy
R. Miller
S. Abiteboul
S. Amer-Yahia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and XML data exchange, which indicates that restrictions have to be imposed on XML schemas and mappings, and on XML shredding schemes, to make the use of relational systems possible. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the inlining technique for dataexchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system

CiteSeerX

Crossref

Edinburgh Research Explorer