69,269 research outputs found
Structure-Aware Sampling: Flexible and Accurate Summarization
In processing large quantities of data, a fundamental problem is to obtain a
summary which supports approximate query answering. Random sampling yields
flexible summaries which naturally support subset-sum queries with unbiased
estimators and well-understood confidence bounds.
Classic sample-based summaries, however, are designed for arbitrary subset
queries and are oblivious to the structure in the set of keys. The particular
structure, such as hierarchy, order, or product space (multi-dimensional),
makes range queries much more relevant for most analysis of the data.
Dedicated summarization algorithms for range-sum queries have also been
extensively studied. They can outperform existing sampling schemes in terms of
accuracy on range queries per summary size. Their accuracy, however, rapidly
degrades when, as is often the case, the query spans multiple ranges. They are
also less flexible - being targeted for range sum queries alone - and are often
quite costly to build and use.
In this paper we propose and evaluate variance optimal sampling schemes that
are structure-aware. These summaries improve over the accuracy of existing
structure-oblivious sampling schemes on range queries while retaining the
benefits of sample-based summaries: flexible summaries, with high accuracy on
both range queries and arbitrary subset queries
Linear and Range Counting under Metric-based Local Differential Privacy
Local differential privacy (LDP) enables private data sharing and analytics
without the need for a trusted data collector. Error-optimal primitives (for,
e.g., estimating means and item frequencies) under LDP have been well studied.
For analytical tasks such as range queries, however, the best known error bound
is dependent on the domain size of private data, which is potentially
prohibitive. This deficiency is inherent as LDP protects the same level of
indistinguishability between any pair of private data values for each data
downer.
In this paper, we utilize an extension of -LDP called Metric-LDP or
-LDP, where a metric defines heterogeneous privacy guarantees for
different pairs of private data values and thus provides a more flexible knob
than does to relax LDP and tune utility-privacy trade-offs. We show
that, under such privacy relaxations, for analytical workloads such as linear
counting, multi-dimensional range counting queries, and quantile queries, we
can achieve significant gains in utility. In particular, for range queries
under -LDP where the metric is the -distance function scaled by
, we design mechanisms with errors independent on the domain sizes;
instead, their errors depend on the metric , which specifies in what
granularity the private data is protected. We believe that the primitives we
design for -LDP will be useful in developing mechanisms for other analytical
tasks, and encourage the adoption of LDP in practice
The XML Query Language Xcerpt: Design Principles, Examples, and Semantics
Most query and transformation languages developed since the mid 90es for XML and semistructured data—e.g. XQuery [1], the precursors of XQuery [2], and XSLT [3]—build upon a path-oriented node selection: A node in a data item is specified in terms of a root-to-node path in the manner of the file selection languages of operating systems. Constructs inspired from the regular expression constructs , +, ?, and “wildcards” give rise to a flexible node retrieval from incompletely specified data items.
This paper further introduces into Xcerpt, a query and transformation language further developing an alternative approach to querying XML and semistructured data first introduced with the language UnQL [4]. A metaphor for this approach views queries as patterns, answers as data items matching the queries. Formally, an answer to a query is defined as a simulation [5] of an instance of the query in a data item
Data Querying with Ciphertext Policy Attribute Based Encryption
Data encryption limits the power and efficiency of queries. Direct processing
of encrypted data should ideally be possible to avoid the need for data
decryption, processing, and re-encryption. It is vital to keep the data
searchable and sortable. That is, some information is intentionally leaked.
This intentional leakage technology is known as "querying over encrypted data
schemes", which offer confidentiality as well as querying over encrypted data,
but it is not meant to provide flexible access control. This paper suggests the
use of Ciphertext Policy Attributes Based Encryption (CP-ABE) to address three
security requirements, namely: confidentiality, queries over encrypted data,
and flexible access control. By combining flexible access control and data
confidentiality, CP-ABE can authenticate who can access data and possess the
secret key. Thus, this paper identifies how much data leakage there is in order
to figure out what kinds of operations are allowed when data is encrypted by
CP-ABE
- …