37 research outputs found
TAPER: query-aware, partition-enhancement for large, heterogenous, graphs
Graph partitioning has long been seen as a viable approach to address Graph
DBMS scalability. A partitioning, however, may introduce extra query processing
latency unless it is sensitive to a specific query workload, and optimised to
minimise inter-partition traversals for that workload. Additionally, it should
also be possible to incrementally adjust the partitioning in reaction to
changes in the graph topology, the query workload, or both. Because of their
complexity, current partitioning algorithms fall short of one or both of these
requirements, as they are designed for offline use and as one-off operations.
The TAPER system aims to address both requirements, whilst leveraging existing
partitioning algorithms. TAPER takes any given initial partitioning as a
starting point, and iteratively adjusts it by swapping chosen vertices across
partitions, heuristically reducing the probability of inter-partition
traversals for a given pattern matching queries workload. Iterations are
inexpensive thanks to time and space optimisations in the underlying support
data structures. We evaluate TAPER on two different large test graphs and over
realistic query workloads. Our results indicate that, given a hash-based
partitioning, TAPER reduces the number of inter-partition traversals by around
80%; given an unweighted METIS partitioning, by around 30%. These reductions
are achieved within 8 iterations and with the additional advantage of being
workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe
Loom: Query-aware Partitioning of Online Graphs
As with general graph processing systems, partitioning data over a cluster of
machines improves the scalability of graph database management systems.
However, these systems will incur additional network cost during the execution
of a query workload, due to inter-partition traversals. Workload-agnostic
partitioning algorithms typically minimise the likelihood of any edge crossing
partition boundaries. However, these partitioners are sub-optimal with respect
to many workloads, especially queries, which may require more frequent
traversal of specific subsets of inter-partition edges. Furthermore, they
largely unsuited to operating incrementally on dynamic, growing graphs.
We present a new graph partitioning algorithm, Loom, that operates on a
stream of graph updates and continuously allocates the new vertices and edges
to partitions, taking into account a query workload of graph pattern
expressions along with their relative frequencies.
First we capture the most common patterns of edge traversals which occur when
executing queries. We then compare sub-graphs, which present themselves
incrementally in the graph update stream, against these common patterns.
Finally we attempt to allocate each match to single partitions, reducing the
number of inter-partition edges within frequently traversed sub-graphs and
improving average query performance.
Loom is extensively evaluated over several large test graphs with realistic
query workloads and various orderings of the graph updates. We demonstrate
that, given a workload, our prototype produces partitionings of significantly
better quality than existing streaming graph partitioning algorithms Fennel and
LDG
Workload-sensitive approaches to improving graph data partitioning online
PhD ThesisMany modern applications, from social networks to network security tools, rely upon
the graph data model, using it as part of an offline analytics pipeline or, increasingly,
for storing and querying data online, e.g. in a graph database management system
(GDBMS). Unfortunately, effective horizontal scaling of this graph data reduces to
the NP-Hard problem of “k-way balanced graph partitioning”.
Owing to the problem’s importance, several practical approaches exist, producing quality graph partitionings. However, these existing systems are unsuitable for partitioning
online graphs, either introducing unnecessary network latency during query processing, being unable to efficiently adapt to changing data and query workloads, or both.
In this thesis we propose partitioning techniques which are efficient and sensitive to
given query workloads, suitable for application to online graphs and query
workloads.
To incrementally adapt partitionings in response to workload change, we propose
TAPER: a graph repartitioner. TAPER uses novel datastructures to compute the
probability of expensive inter -partition traversals (ipt) from each vertex, given the
current workload of path queries. Subsequently, it iteratively adjusts an initial partitioning by swapping selected vertices amongst partitions, heuristically maintaining low
ipt and high partition quality with respect to that workload. Iterations are inexpensive
thanks to time and space optimisations in the underlying datastructures.
To incrementally create partitionings in response to graph growth, we propose Loom:
a streaming graph partitioner. Loom uses another novel datastructure to detect common patterns of edge traversals when executing a given workload of pattern matching
queries. Subsequently, it employs a probabilistic graph isomorphism method to incrementally and efficiently compare sub-graphs in the stream of graph updates, to
these common patterns. Matches are assigned within individual partitions if possible,
thereby also reducing ipt and increasing partitioning quality w.r.t the given workload.
- i -
Both partitioner and repartitioner are extensively evaluated with real/synthetic graph
datasets and query workloads. The headline results include that TAPER can reduce
ipt by upto 80% over a naive existing partitioning and can maintain this reduction in
the event of workload change, through additional iterations. Meanwhile, Loom reduces
ipt by upto 40% over a state of the art streaming graph partitioner
Estresse ocupacional e satisfação dos usuários com os cuidados de saúde primários em Portugal
The Portuguese primary healthcare sector has suffered changes due to a reform on the lines of the conceptual framework referred to by some authors as "New Public Management." These changes may be generating higher levels of occupational stress with a negative impact at individual and organizational levels. This study examines the experience of stress in 305 health professionals (physicians, nurses and clinical secretaries) and satisfaction with the services provided by them from 392 users. The population under scrutiny is taken from 10 type A and 10 type B Family Health Units (FHU). The results show that 84.2% of professionals report moderate to high levels of occupational stress with the nurses being those with higher levels. Users reported good levels of satisfaction, especially with the nursing services. There were no differences in stress level between type A and type B FHU, though there were at the level of user satisfaction of type B FHU users who show higher levels of satisfaction. It was seen that dimensions of user satisfaction were affected by stress related to excess work.info:eu-repo/semantics/publishedVersio
Urban coral reefs: Degradation and resilience of hard coral assemblages in coastal cities of East and Southeast Asia
© 2018 The Author(s) Given predicted increases in urbanization in tropical and subtropical regions, understanding the processes shaping urban coral reefs may be essential for anticipating future conservation challenges. We used a case study approach to identify unifying patterns of urban coral reefs and clarify the effects of urbanization on hard coral assemblages. Data were compiled from 11 cities throughout East and Southeast Asia, with particular focus on Singapore, Jakarta, Hong Kong, and Naha (Okinawa). Our review highlights several key characteristics of urban coral reefs, including “reef compression” (a decline in bathymetric range with increasing turbidity and decreasing water clarity over time and relative to shore), dominance by domed coral growth forms and low reef complexity, variable city-specific inshore-offshore gradients, early declines in coral cover with recent fluctuating periods of acute impacts and rapid recovery, and colonization of urban infrastructure by hard corals. We present hypotheses for urban reef community dynamics and discuss potential of ecological engineering for corals in urban areas
TAPER:query-aware, partition-enhancement for large, heterogenous graphs
Graph partitioning has long been seen as a viable approach to addressing Graph DBMS scalability. A partitioning, however, may introduce extra query processing latency unless it is sensitive to a specific query workload, and optimised to minimise inter-partition traversals for that workload. Additionally, it should also be possible to incrementally adjust the partitioning in reaction to changes in the graph topology, the query workload, or both. Because of their complexity, current partitioning algorithms fall short of one or both of these requirements, as they are designed for offline use and as one-off operations. The TAPER system aims to address both requirements, whilst leveraging existing partitioning algorithms. TAPER takes any given initial partitioning as a starting point, and iteratively adjusts it by swapping chosen vertices across partitions, heuristically reducing the probability of inter-partition traversals for a given path queries workload. Iterations are inexpensive thanks to time and space optimisations in the underlying support data structures. We evaluate TAPER on two different large test graphs and over realistic query workloads. Our results indicate that, given a hash-based partitioning, TAPER reduces the number of inter-partition traversals by ∼ 80%; given an unweighted Metis partitioning, by ∼ 30%. These reductions are achieved within eight iterations and with the additional advantage of being workload-aware and usable online.</p
Workload-aware streaming graph partitioning
Partitioning large graphs, in order to balance storage and processing costs across multiple physical machines, is becoming increasingly necessary as the typical scale of graph data continues to increase. A partitioning, however, may introduce query processing latency due to inter-partition communication overhead, especially if the query workload exhibits skew, frequently traversing a limited subset of graph edges. Existing partitioners are typically workload agnostic and susceptible to such skew; they minimise the likelihood of any edge crossing partition boundaries. We present our progress on LOOM: a streaming graph partitioner based upon efficient existing heuristics, which reduces inter-partition traversals when executing a stream of sub-graph pattern matching queries Q. We are able to continuously summarise the traversal patterns caused by queries within a window over Q. We do this using a generalisation over a trie data structure, which we call TPSTry++, to compactly encode frequent sub-graphs, or motifs, common to many query graphs in Q. When the graph-stream being partitioned contains a match for a motif, LOOM uses graph-stream pattern matching to capture it, and place it wholly within partition boundaries. This increases the likelihood that a random query q ∈ Q may be answered within a single partition, with no inter-partition communication to introduce additional latency. Finally, we discuss the potential pitfalls and drawbacks which exist with our approach, and detail the work yet to be completed.</p
Workload-aware streaming graph partitioning
Partitioning large graphs, in order to balance storage and processing costs across multiple physical machines, is becoming increasingly necessary as the typical scale of graph data continues to increase. A partitioning, however, may introduce query processing latency due to inter-partition communication overhead, especially if the query workload exhibits skew, frequently traversing a limited subset of graph edges. Existing partitioners are typically workload agnostic and susceptible to such skew; they minimise the likelihood of any edge crossing partition boundaries. We present our progress on LOOM: a streaming graph partitioner based upon efficient existing heuristics, which reduces inter-partition traversals when executing a stream of sub-graph pattern matching queries Q. We are able to continuously summarise the traversal patterns caused by queries within a window over Q. We do this using a generalisation over a trie data structure, which we call TPSTry++, to compactly encode frequent sub-graphs, or motifs, common to many query graphs in Q. When the graph-stream being partitioned contains a match for a motif, LOOM uses graph-stream pattern matching to capture it, and place it wholly within partition boundaries. This increases the likelihood that a random query q ∈ Q may be answered within a single partition, with no inter-partition communication to introduce additional latency. Finally, we discuss the potential pitfalls and drawbacks which exist with our approach, and detail the work yet to be completed.</p
TAPER:query-aware, partition-enhancement for large, heterogenous graphs
Graph partitioning has long been seen as a viable approach to addressing Graph DBMS scalability. A partitioning, however, may introduce extra query processing latency unless it is sensitive to a specific query workload, and optimised to minimise inter-partition traversals for that workload. Additionally, it should also be possible to incrementally adjust the partitioning in reaction to changes in the graph topology, the query workload, or both. Because of their complexity, current partitioning algorithms fall short of one or both of these requirements, as they are designed for offline use and as one-off operations. The TAPER system aims to address both requirements, whilst leveraging existing partitioning algorithms. TAPER takes any given initial partitioning as a starting point, and iteratively adjusts it by swapping chosen vertices across partitions, heuristically reducing the probability of inter-partition traversals for a given path queries workload. Iterations are inexpensive thanks to time and space optimisations in the underlying support data structures. We evaluate TAPER on two different large test graphs and over realistic query workloads. Our results indicate that, given a hash-based partitioning, TAPER reduces the number of inter-partition traversals by ∼ 80%; given an unweighted Metis partitioning, by ∼ 30%. These reductions are achieved within eight iterations and with the additional advantage of being workload-aware and usable online.</p
Provgen:Generating synthetic PROV graphs with predictable structure
This paper introduces provGen, a generator aimed at producing large synthetic provenance graphs with predictable properties and of arbitrary size. Synthetic provenance graphs serve two main purposes. Firstly, they provide a variety of controlled workloads that can be used to test storage and query capabilities of provenance management systems at scale. Secondly, they provide challenging testbeds for experimenting with graph algorithms for provenance analytics, an area of increasing research interest. provGen produces PROV graphs and stores them in a graph DBMS (Neo4J). A key feature is to let users control the relationship makeup and topological features of the graph, by providing a seed provenance pattern along with a set of constraints, expressed using a custom Domain Specific Language. We also propose a simple method for evaluating the quality of the generated graphs, by measuring how realistically they simulate the structure of real-world patterns.</p