119,881 research outputs found

    Approximation in Databases

    Get PDF
    One source of partial information in databases is the need to combine information from several databases. Even if each database is complete for some world , the combined databases will not be, and answers to queries against such combined databases can only be approximated. In this paper we describe various situations in which a precise answer cannot be obtained for a query asked against multiple databases. Based on an analysis of these situations, we propose a classification of constructs that can be used to model approximations. One of the main goals is to show that most of these models of approximations possess universality properties. The main motivation for doing this is applying the data-oriented approach, which turns universality properties into syntax, to obtain languages for approximations. We show that the languages arising from the universality properties have a number of limitations. In an attempt to overcome those limitations, we explain how all the languages can be embedded into a language for conjunctive and disjunctive sets from [21], and demonstrate its usefulness in querying independent databases

    A General Framework for Anytime Approximation in Probabilistic Databases

    Get PDF
    Anytime approximation algorithms that compute the probabilities of queries over probabilistic databases can be of great use to statistical learning tasks. Those approaches have been based so far on either (i) sampling or (ii) branch-and-bound with model-based bounds. We present here a more general branch-and-bound framework that extends the possible bounds by using 'dissociation', which yields tighter bounds.Comment: 3 pages, 2 figures, submitted to StarAI 2018 Worksho

    XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

    Full text link
    Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree TT, in which every node ii has a size cic_i and profit pip_i, and the size limitation CC. The target is to find a subset of subtrees rooted at nodes i1,⋯ ,iki_1,\cdots, i_k respectively such that ci1+⋯+cik≤Cc_{i_1}+\cdots +c_{i_k}\le C, and pi1+⋯+pikp_{i_1}+\cdots +p_{i_k} is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

    Rough clustering for web transactions

    Get PDF
    Grouping web transactions into clusters is important in order to obtain better understanding of user's behavior. Currently, the rough approximation-based clustering technique has been used to group web transactions into clusters. It is based on the similarity of upper approximations of transactions by given threshold. However, the processing time is still an issue due to the high complexity for finding the similarity of upper approximations of a transaction which used to merge between two or more clusters. In this study, an alternative technique for grouping web transactions using rough set theory is proposed. It is based on the two similarity classes which is nonvoid intersection. The technique is implemented in MATLAB ® version 7.6.0.324 (R2008a). The two UCI benchmark datasets taken from: http:/kdd.ics.uci.edu/ databases/msnbc/msnbc.html and http:/kdd.ics.uci.edu/databases/ Microsoft / microsoft.html are opted in the simulation processes. The simulation reveals that the proposed technique significantly requires lower response time up to 62.69 % and 66.82 % as compared to the rough approximation-based clustering, severally. Meanwhile, for cluster purity it performs better until 2.5 % and 14.47%, respectively

    HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

    Full text link
    Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201
    • …
    corecore