29,133 research outputs found
A Framework for the Study of Query Decomposition for Heterogeneous Distributed Database Management Systems
This paper presents a framework for the study of the query decomposition translation for heterogeneous record -oriented database management systems. This framework is based on the applied database logic representation of relational, hierarchical and network databases. The input to the query decomposition translation is the query graph which is derived from the complex to basic, external to conceptual and logical optimization translations. Once the query graph is obtained the objective of the query decomposition translation is to break up a query expressed in terms of the actual or conceptual databases into its component parts or subqueries and find a strategy indicating the sequence of primitive or fundamental operations and their corresponding processing sites in the network necessary to answer the query. The query processing strategy is usually chosen so as to satisfy some performance criterion such as response time reduction. Contingent on after each primitive operation. The prequery decomposition translation, the query decomposition translation and the size estimation issues are presented through an example based on the current implementation of the Distributed Access View Integration Database (DAVID) currently being built at NASA's Goddard Space Flight Center (GSFC). The choice of a query processing strategy is the successful estimation of intermediate result
An Ontology Connected to Several Data Repositories: Query Processing Steps
The great expansion of communication networks has made avail- able to users a huge number of heterogeneous and autonomous data repositories that present different structures/organizations, query languages and data semantics. In that context it is clear that new information retrieval techniques with a strategy that focuses on in- formation content and semantics are needed. We propose to use domain specific Ontologies to capture the information content of such repositories whenever available. We describe such Ontolo- gies using a system based on Description Logics. In this paper we present all the stages of the processing of a query formulated over an Ontology when the answer must be found in the underly- ing data repositories. Those stages make up a subpart of the global query processing strategy defined for a set of loosely-coupled On- tologies. We show first how the query is transformed into a seman- tically equivalent one and how inconsistent queries are detected. Then, we explain the test to verify if the query can be answered from the cache memory. Next, we present a set of heuristics used during the query decomposition process. Later on, we show how to optimize plans associated to subqueries that access the underlying data repositories and finally we illustrate how the answers retrieved from the repositories are correlated in order to generate the query answer
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
Cyber security is one of the most significant technical challenges in current
times. Detecting adversarial activities, prevention of theft of intellectual
properties and customer data is a high priority for corporations and government
agencies around the world. Cyber defenders need to analyze massive-scale,
high-resolution network flows to identify, categorize, and mitigate attacks
involving networks spanning institutional and national boundaries. Many of the
cyber attacks can be described as subgraph patterns, with prominent examples
being insider infiltrations (path queries), denial of service (parallel paths)
and malicious spreads (tree queries). This motivates us to explore subgraph
matching on streaming graphs in a continuous setting. The novelty of our work
lies in using the subgraph distributional statistics collected from the
streaming graph to determine the query processing strategy. We introduce a
"Lazy Search" algorithm where the search strategy is decided on a
vertex-to-vertex basis depending on the likelihood of a match in the vertex
neighborhood. We also propose a metric named "Relative Selectivity" that is
used to select between different query processing strategies. Our experiments
performed on real online news, network traffic stream and a synthetic social
network benchmark demonstrate 10-100x speedups over selectivity agnostic
approaches.Comment: in 18th International Conference on Extending Database Technology
(EDBT) (2015
Query processing of geometric objects with free form boundarie sin spatial databases
The increasing demand for the use of database systems as an integrating
factor in CAD/CAM applications has necessitated the development of database
systems with appropriate modelling and retrieval capabilities. One essential
problem is the treatment of geometric data which has led to the development of
spatial databases. Unfortunately, most proposals only deal with simple geometric
objects like multidimensional points and rectangles. On the other hand, there has
been a rapid development in the field of representing geometric objects with free
form curves or surfaces, initiated by engineering applications such as mechanical
engineering, aviation or astronautics. Therefore, we propose a concept for the realization
of spatial retrieval operations on geometric objects with free form
boundaries, such as B-spline or Bezier curves, which can easily be integrated in
a database management system. The key concept is the encapsulation of geometric
operations in a so-called query processor. First, this enables the definition of
an interface allowing the integration into the data model and the definition of the
query language of a database system for complex objects. Second, the approach
allows the use of an arbitrary representation of the geometric objects. After a
short description of the query processor, we propose some representations for free
form objects determined by B-spline or Bezier curves. The goal of efficient query
processing in a database environment is achieved using a combination of decomposition
techniques and spatial access methods. Finally, we present some experimental
results indicating that the performance of decomposition techniques is
clearly superior to traditional query processing strategies for geometric objects
with free form boundaries
Efficient Subgraph Matching on Billion Node Graphs
The ability to handle large scale graph data is crucial to an increasing
number of applications. Much work has been dedicated to supporting basic graph
operations such as subgraph matching, reachability, regular expression
matching, etc. In many cases, graph indices are employed to speed up query
processing. Typically, most indices require either super-linear indexing time
or super-linear indexing space. Unfortunately, for very large graphs,
super-linear approaches are almost always infeasible. In this paper, we study
the problem of subgraph matching on billion-node graphs. We present a novel
algorithm that supports efficient subgraph matching for graphs deployed on a
distributed memory store. Instead of relying on super-linear indices, we use
efficient graph exploration and massive parallel computing for query
processing. Our experimental results demonstrate the feasibility of performing
subgraph matching on web-scale graph data.Comment: VLDB201
Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy
Differential privacy is a promising privacy-preserving paradigm for
statistical query processing over sensitive data. It works by injecting random
noise into each query result, such that it is provably hard for the adversary
to infer the presence or absence of any individual record from the published
noisy results. The main objective in differentially private query processing is
to maximize the accuracy of the query results, while satisfying the privacy
guarantees. Previous work, notably \cite{LHR+10}, has suggested that with an
appropriate strategy, processing a batch of correlated queries as a whole
achieves considerably higher accuracy than answering them individually.
However, to our knowledge there is currently no practical solution to find such
a strategy for an arbitrary query batch; existing methods either return
strategies of poor quality (often worse than naive methods) or require
prohibitively expensive computations for even moderately large domains.
Motivated by this, we propose low-rank mechanism (LRM), the first practical
differentially private technique for answering batch linear queries with high
accuracy. LRM works for both exact (i.e., -) and approximate (i.e.,
(, )-) differential privacy definitions. We derive the
utility guarantees of LRM, and provide guidance on how to set the privacy
parameters given the user's utility expectation. Extensive experiments using
real data demonstrate that our proposed method consistently outperforms
state-of-the-art query processing solutions under differential privacy, by
large margins.Comment: ACM Transactions on Database Systems (ACM TODS). arXiv admin note:
text overlap with arXiv:1212.230
- …