Search CORE

119,881 research outputs found

Approximation in Databases

Author: Libkin Leonid
Publication venue: ScholarlyCommons
Publication date: 01/05/1994
Field of study

One source of partial information in databases is the need to combine information from several databases. Even if each database is complete for some world , the combined databases will not be, and answers to queries against such combined databases can only be approximated. In this paper we describe various situations in which a precise answer cannot be obtained for a query asked against multiple databases. Based on an analysis of these situations, we propose a classification of constructs that can be used to model approximations. One of the main goals is to show that most of these models of approximations possess universality properties. The main motivation for doing this is applying the data-oriented approach, which turns universality properties into syntax, to obtain languages for approximations. We show that the languages arising from the universality properties have a number of limitations. In an attempt to overcome those limitations, we explain how all the languages can be embedded into a language for conjunctive and disjunctive sets from [21], and demonstrate its usefulness in querying independent databases

CiteSeerX

ScholarlyCommons@Penn

A General Framework for Anytime Approximation in Probabilistic Databases

Author: Gatterbauer Wolfgang
Geerts Floris
Heuvel Maarten Van den
Theobald Martin
Publication venue
Publication date: 03/07/2018
Field of study

Anytime approximation algorithms that compute the probabilities of queries over probabilistic databases can be of great use to statistical learning tasks. Those approaches have been based so far on either (i) sampling or (ii) branch-and-bound with model-based bounds. We present here a more general branch-and-bound framework that extends the possible bounds by using 'dissociation', which yields tighter bounds.Comment: 3 pages, 2 figures, submitted to StarAI 2018 Worksho

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

Author: A. Balmin
A. Chebotko
D. Florescu
D. Kossmann
H. Gupta
H. Gupta
H.V. Jagadish
M. Atay
M.R. Garey
R. Chirkova
S. Abiteboul
S. Chaudhuri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree

T

, in which every node

i

has a size

c_i

and profit

p_i

, and the size limitation

C

. The target is to find a subset of subtrees rooted at nodes

i_1,\cdots, i_k

respectively such that

c_{i_1}+\cdots +c_{i_k}\le C

, and

p_{i_1}+\cdots +p_{i_k}

is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

arXiv.org e-Print Archive

Crossref

Rough clustering for web transactions

Author: Yanto Iwan Tri Riyadi
Publication venue
Publication date: 01/01/2011
Field of study

Grouping web transactions into clusters is important in order to obtain better understanding of user's behavior. Currently, the rough approximation-based clustering technique has been used to group web transactions into clusters. It is based on the similarity of upper approximations of transactions by given threshold. However, the processing time is still an issue due to the high complexity for finding the similarity of upper approximations of a transaction which used to merge between two or more clusters. In this study, an alternative technique for grouping web transactions using rough set theory is proposed. It is based on the two similarity classes which is nonvoid intersection. The technique is implemented in MATLAB ® version 7.6.0.324 (R2008a). The two UCI benchmark datasets taken from: http:/kdd.ics.uci.edu/ databases/msnbc/msnbc.html and http:/kdd.ics.uci.edu/databases/ Microsoft / microsoft.html are opted in the simulation processes. The simulation reveals that the proposed technique significantly requires lower response time up to 62.69 % and 66.82 % as compared to the rough approximation-based clustering, severally. Meanwhile, for cluster purity it performs better until 2.5 % and 14.47%, respectively

UTHM Institutional Repository

Approximation in databases

Author: A. Romanowska
C. Gunter
C. Zaniolo
G. Grahne
J. Biskup
L. Colby
P. Buneman
P. Buneman
T. Imielinski
W. Lipski
W. Lipski
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Author: Arora Akhil
Bhattacharya Arnab
Kumar Piyush
Sinha Sakshi
Publication venue: 'VLDB Endowment'
Publication date: 23/04/2018
Field of study

Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne