11 research outputs found
Protecting Scattered Database by Enforcing Data Preservation Using Data Protection Facilitator
In this paper we are incorporating data preservation in scattered database structure i.e. method of preserving data in scattered database structure and having secure access over it. In this paper data preservation is examined and solution is provided on the aforesaid condition. This paper is a summarized concept of documentation, authorization, access control and encryption that are main points to be taken in consideration in data preservation in scattered database structure. We propose a new method for secure access based on service provider comprising security application. This model set out for safe search on server and user relation. In this paper we used heuristic approach for preservation for scattered database system regarding security, as the importance of secure access is increasing in scattered domains on different issues, in this way we enhanced the database security in Scattered database environment
Performance Guarantees for Distributed Reachability Queries
In the real world a graph is often fragmented and distributed across
different sites. This highlights the need for evaluating queries on distributed
graphs. This paper proposes distributed evaluation algorithms for three classes
of queries: reachability for determining whether one node can reach another,
bounded reachability for deciding whether there exists a path of a bounded
length between a pair of nodes, and regular reachability for checking whether
there exists a path connecting two nodes such that the node labels on the path
form a string in a given regular expression. We develop these algorithms based
on partial evaluation, to explore parallel computation. When evaluating a query
Q on a distributed graph G, we show that these algorithms possess the following
performance guarantees, no matter how G is fragmented and distributed: (1) each
site is visited only once; (2) the total network traffic is determined by the
size of Q and the fragmentation of G, independent of the size of G; and (3) the
response time is decided by the largest fragment of G rather than the entire G.
In addition, we show that these algorithms can be readily implemented in the
MapReduce framework. Using synthetic and real-life data, we experimentally
verify that these algorithms are scalable on large graphs, regardless of how
the graphs are distributed.Comment: VLDB201
Boosting XML Filtering with a Scalable FPGA-based Architecture
The growing amount of XML encoded data exchanged over the Internet increases
the importance of XML based publish-subscribe (pub-sub) and content based
routing systems. The input in such systems typically consists of a stream of
XML documents and a set of user subscriptions expressed as XML queries. The
pub-sub system then filters the published documents and passes them to the
subscribers. Pub-sub systems are characterized by very high input ratios,
therefore the processing time is critical. In this paper we propose a "pure
hardware" based solution, which utilizes XPath query blocks on FPGA to solve
the filtering problem. By utilizing the high throughput that an FPGA provides
for parallel processing, our approach achieves drastically better throughput
than the existing software or mixed (hardware/software) architectures. The
XPath queries (subscriptions) are translated to regular expressions which are
then mapped to FPGA devices. By introducing stacks within the FPGA we are able
to express and process a wide range of path queries very efficiently, on a
scalable environment. Moreover, the fact that the parser and the filter
processing are performed on the same FPGA chip, eliminates expensive
communication costs (that a multi-core system would need) thus enabling very
fast and efficient pipelining. Our experimental evaluation reveals more than
one order of magnitude improvement compared to traditional pub/sub systems.Comment: CIDR 200
Processing SPARQL Queries Over Distributed RDF Graphs
We propose techniques for processing SPARQL queries over a large RDF graph in
a distributed environment. We adopt a "partial evaluation and assembly"
framework. Answering a SPARQL query Q is equivalent to finding subgraph matches
of the query graph Q over RDF graph G. Based on properties of subgraph matching
over a distributed graph, we introduce local partial match as partial answers
in each fragment of RDF graph G. For assembly, we propose two methods:
centralized and distributed assembly. We analyze our algorithms from both
theoretically and experimentally. Extensive experiments over both real and
benchmark RDF repositories of billions of triples confirm that our method is
superior to the state-of-the-art methods in both the system's performance and
scalability.Comment: 30 page
åę£XMLć«åƾććXSLTå®č”ęę³ć«é¢ććē ē©¶
Thesis (Master of Information Science)--University of Tsukuba, no. 34307, 2015.3.25201
Graph pattern matching on social network analysis
Graph pattern matching is fundamental to social network analysis. Its effectiveness
for identifying social communities and social positions, making recommendations and
so on has been repeatedly demonstrated. However, the social network analysis raises
new challenges to graph pattern matching. As real-life social graphs are typically
large, it is often prohibitively expensive to conduct graph pattern matching over such
large graphs, e.g., NP-complete for subgraph isomorphism, cubic time for bounded
simulation, and quadratic time for simulation. These hinder the applicability of graph
pattern matching on social network analysis. In response to these challenges, the thesis
presents a series of effective techniques for querying large, dynamic, and distributively
stored social networks.
First of all, we propose a notion of query preserving graph compression, to compress
large social graphs relative to a class Q of queries. We then develop both batch
and incremental compression strategies for two commonly used pattern queries. Via
both theoretical analysis and experimental studies, we show that (1) using compressed
graphs Gr benefits graph pattern matching dramatically; and (2) the computation of Gr
as well as its maintenance can be processed efficiently.
Secondly, we investigate the distributed graph pattern matching problem, and explore
parallel computation for graph pattern matching. We show that our techniques
possess following performance guarantees: (1) each site is visited only once; (2) the total
network traffic is independent of the size of G; and (3) the response time is decided
by the size of largest fragment of G rather than the size of entire G. Furthermore, we
show how these distributed algorithms can be implemented in the MapReduce framework.
Thirdly, we study the problem of answering graph pattern matching using views
since view based techniques have proven an effective technique for speeding up query
evaluation. We propose a notion of pattern containment to characterise graph pattern
matching using views, and introduce efficient algorithms to answer graph pattern
matching using views. Moreover, we identify three problems related to graph pattern
containment, and provide efficient algorithms for containment checking (approximation
when the problem is intractable).
Fourthly, we revise graph pattern matching by supporting a designated output node,
which we treat as āquery focusā. We then introduce algorithms for computing the top-k
relevant matches w.r.t. the output node for both acyclic and cyclic pattern graphs, respectively,
with early termination property. Furthermore, we investigate the diversified
top-k matching problem, and develop an approximation algorithm with performance
guarantee and a heuristic algorithm with early termination property.
Finally, we introduce an expert search system, called ExpFinder, for large and dynamic
social networks. ExpFinder identifies top-k experts in social networks by graph
pattern matching, and copes with the sheer size of real-life social networks by integrating
incremental graph pattern matching, query preserving compression and top-k
matching computation. In particular, we also introduce bounded (resp. unbounded)
incremental algorithms to maintain the weighted landmark vectors which are used for
incremental maintenance for cached results
GRAPE: Parallel Graph Query Engine
The need for graph computations is evident in a multitude of use cases. To support
computations on large-scale graphs, several parallel systems have been developed.
However, existing graph systems require users to recast algorithms into new models,
which makes parallel graph computations as a privilege to experienced users only.
Moreover, real world applications often require much more complex graph processing
workflows than previously evaluated. In response to these challenges, the thesis
presents GRAPE, a distributed graph computation system, shipped with various applications
for social network analysis, social media marketing and functional dependencies
on graphs.
Firstly, the thesis presents the foundation of GRAPE. The principled approach of
GRAPE is based on partial evaluation and incremental computation. Sequential graph
algorithms can be plugged into GRAPE with minor changes, and get parallelized as a
whole. The termination and correctness are guaranteed under a monotonic condition.
Secondly, as an application on GRAPE, the thesis proposes graph-pattern association
rules (GPARs) for social media marketing. GPARs help users discover regularities
between entities in social graphs and identify potential customers by exploring social
influence. The thesis studies the problem of discovering top-k diversified GPARs and
the problem of identifying potential customers with GPARs. Although both are NP-
hard, parallel scalable algorithms on GRAPE are developed, which guarantee a polynomial
speedup over sequential algorithms with the increase of processors.
Thirdly, the thesis proposes quantified graph patterns (QGPs), an extension of
graph patterns by supporting simple counting quantifiers on edges. QGPs naturally express
universal and existential quantification, numeric and ratio aggregates, as well as
negation. The thesis proves that the matching problem of QGPs remains NP-complete
in the absence of negation, and is DP-complete for general QGPs. In addition, the
thesis introduces quantified graph association rules defined with QGPs, to identify potential
customers in social media marketing.
Finally, to address the issue of data consistency, the thesis proposes a class of functional
dependencies for graphs, referred to as GFDs. GFDs capture both attribute-value
dependencies and topological structures of entities. The satisfiability and implication
problems for GFDs are studied and proved to be coNP-complete and NP-complete,
respectively. The thesis also proves that the validation problem for GFDs is coNP-
complete. The parallel algorithms developed on GRAPE verify that GFDs provide an
effective approach to detecting inconsistencies in knowledge and social graphs
Distributed XML Query Processing
While centralized query processing over collections of XML data stored at a single site is a well understood problem,
centralized query evaluation techniques are inherently limited in their scalability when presented
with large collections (or a single, large document) and heavy query workloads.
In the context of relational query processing,
similar scalability challenges have been overcome by partitioning data collections,
distributing them across the sites of a distributed system, and then
evaluating queries in a distributed fashion, usually in a way that ensures locality between
(sub-)queries and their relevant data.
This thesis presents a suite of query evaluation techniques for XML data that follow a similar
approach to address the scalability problems encountered by XML query evaluation.
Due to the significant differences in data and query models between relational and XML query
processing, it is not possible to directly apply distributed query evaluation techniques designed
for relational data to the XML scenario.
Instead, new distributed query evaluation
techniques need to be developed.
Thus, in this thesis, an end-to-end solution to the scalability problems encountered by XML query
processing is proposed.
Based on a data partitioning model that supports both horizontal and vertical
fragmentation steps (or any combination of the two), XML collections are fragmented and distributed
across the sites of a distributed system.
Then, a suite of distributed query evaluation strategies is
proposed. These query evaluation techniques ensure locality between each fragment of the collection and
the parts of the query corresponding to the data in this fragment. Special attention is paid to
scalability and query performance, which is achieved by ensuring a high degree of parallelism
during distributed query evaluation and by avoiding access to irrelevant portions of the data.
For maximum flexibility, the suite of distributed query evaluation techniques proposed in this thesis provides
several alternative approaches
for evaluating a given query over a given distributed collection. Thus, to achieve the best performance, it is
necessary to predict and compare the expected performance of each of these alternatives. In this
work, this is accomplished through a query optimization technique based on a
distribution-aware cost model. The same cost model is also used to fine-tune the way a collection is
fragmented to the demands of the query workload evaluated over this collection.
To evaluate the performance impact of the distributed query evaluation techniques proposed in this
thesis, the techniques were implemented within
a production-quality XML database system. Based on this implementation, a
thorough experimental evaluation was performed. The results of this evaluation confirm that the distributed query evaluation
techniques introduced here lead to significant improvements in query performance and scalability
both when compared to centralized techniques and when compared to existing distributed query
evaluation techniques