18,546 research outputs found
High performance subgraph mining in molecular compounds
Structured data represented in the form of graphs arises in
several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining
problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main
aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing
algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network
of workstations
How Long It Takes for an Ordinary Node with an Ordinary ID to Output?
In the context of distributed synchronous computing, processors perform in
rounds, and the time-complexity of a distributed algorithm is classically
defined as the number of rounds before all computing nodes have output. Hence,
this complexity measure captures the running time of the slowest node(s). In
this paper, we are interested in the running time of the ordinary nodes, to be
compared with the running time of the slowest nodes. The node-averaged
time-complexity of a distributed algorithm on a given instance is defined as
the average, taken over every node of the instance, of the number of rounds
before that node output. We compare the node-averaged time-complexity with the
classical one in the standard LOCAL model for distributed network computing. We
show that there can be an exponential gap between the node-averaged
time-complexity and the classical time-complexity, as witnessed by, e.g.,
leader election. Our first main result is a positive one, stating that, in
fact, the two time-complexities behave the same for a large class of problems
on very sparse graphs. In particular, we show that, for LCL problems on cycles,
the node-averaged time complexity is of the same order of magnitude as the
slowest node time-complexity.
In addition, in the LOCAL model, the time-complexity is computed as a worst
case over all possible identity assignments to the nodes of the network. In
this paper, we also investigate the ID-averaged time-complexity, when the
number of rounds is averaged over all possible identity assignments. Our second
main result is that the ID-averaged time-complexity is essentially the same as
the expected time-complexity of randomized algorithms (where the expectation is
taken over all possible random bits used by the nodes, and the number of rounds
is measured for the worst-case identity assignment).
Finally, we study the node-averaged ID-averaged time-complexity.Comment: (Submitted) Journal versio
Formal analysis techniques for gossiping protocols
We give a survey of formal verification techniques that can be used to corroborate existing experimental results for gossiping protocols in a rigorous manner. We present properties of interest for gossiping protocols and discuss how various formal evaluation techniques can be employed to predict them
Leader Election in Anonymous Rings: Franklin Goes Probabilistic
We present a probabilistic leader election algorithm for anonymous, bidirectional, asynchronous rings. It is based on an algorithm from Franklin, augmented with random identity selection, hop counters to detect identity clashes, and round numbers modulo 2. As a result, the algorithm is finite-state, so that various model checking techniques can be employed to verify its correctness, that is, eventually a unique leader is elected with probability one. We also sketch a formal correctness proof of the algorithm for rings with arbitrary size
Robustness of Randomized Rumour Spreading
In this work we consider three well-studied broadcast protocols: Push, Pull
and Push&Pull. A key property of all these models, which is also an important
reason for their popularity, is that they are presumed to be very robust, since
they are simple, randomized, and, crucially, do not utilize explicitly the
global structure of the underlying graph. While sporadic results exist, there
has been no systematic theoretical treatment quantifying the robustness of
these models. Here we investigate this question with respect to two orthogonal
aspects: (adversarial) modifications of the underlying graph and message
transmission failures.
We explore in particular the following notion of Local Resilience: beginning
with a graph, we investigate up to which fraction of the edges an adversary has
to be allowed to delete at each vertex, so that the protocols need
significantly more rounds to broadcast the information. Our main findings
establish a separation among the three models. It turns out that Pull is robust
with respect to all parameters that we consider. On the other hand, Push may
slow down significantly, even if the adversary is allowed to modify the degrees
of the vertices by an arbitrarily small positive fraction only. Finally,
Push&Pull is robust when no message transmission failures are considered,
otherwise it may be slowed down.
On the technical side, we develop two novel methods for the analysis of
randomized rumour spreading protocols. First, we exploit the notion of
self-bounding functions to facilitate significantly the round-based analysis:
we show that for any graph the variance of the growth of informed vertices is
bounded by its expectation, so that concentration results follow immediately.
Second, in order to control adversarial modifications of the graph we make use
of a powerful tool from extremal graph theory, namely Szemer\`edi's Regularity
Lemma.Comment: version 2: more thorough literature revie
Simulating Wde-area Replication
We describe our experiences with simulating replication algorithms for use in far flung distributed systems. The algorithms under scrutiny mimic epidemics. Epidemic algorithms seem to scale and adapt to change (such as varying replica sets) well. The loose consistency guarantees they make seem more useful in applications where availability strongly outweighs correctness; e.g., distributed name service
Cost-effectiveness analysis of 3-D computerized tomography colonography versus optical colonoscopy for imaging symptomatic gastroenterology patients.
BACKGROUND: When symptomatic gastroenterology patients have an indication for colonic imaging, clinicians have a choice between optical colonoscopy (OC) and computerized tomography colonography with three-dimensional reconstruction (3-D CTC). 3-D CTC provides a minimally invasive and rapid evaluation of the entire colon, and it can be an efficient modality for diagnosing symptoms. It allows for a more targeted use of OC, which is associated with a higher risk of major adverse events and higher procedural costs. A case can be made for 3-D CTC as a primary test for colonic imaging followed if necessary by targeted therapeutic OC; however, the relative long-term costs and benefits of introducing 3-D CTC as a first-line investigation are unknown. AIM: The aim of this study was to assess the cost effectiveness of 3-D CTC versus OC for colonic imaging of symptomatic gastroenterology patients in the UK NHS. METHODS: We used a Markov model to follow a cohort of 100,000 symptomatic gastroenterology patients, aged 50 years or older, and estimate the expected lifetime outcomes, life years (LYs) and quality-adjusted life years (QALYs), and costs (£, 2010-2011) associated with 3-D CTC and OC. Sensitivity analyses were performed to assess the robustness of the base-case cost-effectiveness results to variation in input parameters and methodological assumptions. RESULTS: 3D-CTC provided a similar number of LYs (7.737 vs 7.739) and QALYs (7.013 vs 7.018) per individual compared with OC, and it was associated with substantially lower mean costs per patient (£467 vs £583), leading to a positive incremental net benefit. After accounting for the overall uncertainty, the probability of 3-D CTC being cost effective was around 60 %, at typical willingness-to-pay values of £20,000-£30,000 per QALY gained. CONCLUSION: 3-D CTC is a cost-saving and cost-effective option for colonic imaging of symptomatic gastroenterology patients compared with OC
- …