18,546 research outputs found

    High performance subgraph mining in molecular compounds

    Get PDF
    Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

    How Long It Takes for an Ordinary Node with an Ordinary ID to Output?

    Full text link
    In the context of distributed synchronous computing, processors perform in rounds, and the time-complexity of a distributed algorithm is classically defined as the number of rounds before all computing nodes have output. Hence, this complexity measure captures the running time of the slowest node(s). In this paper, we are interested in the running time of the ordinary nodes, to be compared with the running time of the slowest nodes. The node-averaged time-complexity of a distributed algorithm on a given instance is defined as the average, taken over every node of the instance, of the number of rounds before that node output. We compare the node-averaged time-complexity with the classical one in the standard LOCAL model for distributed network computing. We show that there can be an exponential gap between the node-averaged time-complexity and the classical time-complexity, as witnessed by, e.g., leader election. Our first main result is a positive one, stating that, in fact, the two time-complexities behave the same for a large class of problems on very sparse graphs. In particular, we show that, for LCL problems on cycles, the node-averaged time complexity is of the same order of magnitude as the slowest node time-complexity. In addition, in the LOCAL model, the time-complexity is computed as a worst case over all possible identity assignments to the nodes of the network. In this paper, we also investigate the ID-averaged time-complexity, when the number of rounds is averaged over all possible identity assignments. Our second main result is that the ID-averaged time-complexity is essentially the same as the expected time-complexity of randomized algorithms (where the expectation is taken over all possible random bits used by the nodes, and the number of rounds is measured for the worst-case identity assignment). Finally, we study the node-averaged ID-averaged time-complexity.Comment: (Submitted) Journal versio

    Formal analysis techniques for gossiping protocols

    Get PDF
    We give a survey of formal verification techniques that can be used to corroborate existing experimental results for gossiping protocols in a rigorous manner. We present properties of interest for gossiping protocols and discuss how various formal evaluation techniques can be employed to predict them

    Leader Election in Anonymous Rings: Franklin Goes Probabilistic

    Get PDF
    We present a probabilistic leader election algorithm for anonymous, bidirectional, asynchronous rings. It is based on an algorithm from Franklin, augmented with random identity selection, hop counters to detect identity clashes, and round numbers modulo 2. As a result, the algorithm is finite-state, so that various model checking techniques can be employed to verify its correctness, that is, eventually a unique leader is elected with probability one. We also sketch a formal correctness proof of the algorithm for rings with arbitrary size

    Robustness of Randomized Rumour Spreading

    Get PDF
    In this work we consider three well-studied broadcast protocols: Push, Pull and Push&Pull. A key property of all these models, which is also an important reason for their popularity, is that they are presumed to be very robust, since they are simple, randomized, and, crucially, do not utilize explicitly the global structure of the underlying graph. While sporadic results exist, there has been no systematic theoretical treatment quantifying the robustness of these models. Here we investigate this question with respect to two orthogonal aspects: (adversarial) modifications of the underlying graph and message transmission failures. We explore in particular the following notion of Local Resilience: beginning with a graph, we investigate up to which fraction of the edges an adversary has to be allowed to delete at each vertex, so that the protocols need significantly more rounds to broadcast the information. Our main findings establish a separation among the three models. It turns out that Pull is robust with respect to all parameters that we consider. On the other hand, Push may slow down significantly, even if the adversary is allowed to modify the degrees of the vertices by an arbitrarily small positive fraction only. Finally, Push&Pull is robust when no message transmission failures are considered, otherwise it may be slowed down. On the technical side, we develop two novel methods for the analysis of randomized rumour spreading protocols. First, we exploit the notion of self-bounding functions to facilitate significantly the round-based analysis: we show that for any graph the variance of the growth of informed vertices is bounded by its expectation, so that concentration results follow immediately. Second, in order to control adversarial modifications of the graph we make use of a powerful tool from extremal graph theory, namely Szemer\`edi's Regularity Lemma.Comment: version 2: more thorough literature revie

    Simulating Wde-area Replication

    Get PDF
    We describe our experiences with simulating replication algorithms for use in far flung distributed systems. The algorithms under scrutiny mimic epidemics. Epidemic algorithms seem to scale and adapt to change (such as varying replica sets) well. The loose consistency guarantees they make seem more useful in applications where availability strongly outweighs correctness; e.g., distributed name service

    Cost-effectiveness analysis of 3-D computerized tomography colonography versus optical colonoscopy for imaging symptomatic gastroenterology patients.

    No full text
    BACKGROUND: When symptomatic gastroenterology patients have an indication for colonic imaging, clinicians have a choice between optical colonoscopy (OC) and computerized tomography colonography with three-dimensional reconstruction (3-D CTC). 3-D CTC provides a minimally invasive and rapid evaluation of the entire colon, and it can be an efficient modality for diagnosing symptoms. It allows for a more targeted use of OC, which is associated with a higher risk of major adverse events and higher procedural costs. A case can be made for 3-D CTC as a primary test for colonic imaging followed if necessary by targeted therapeutic OC; however, the relative long-term costs and benefits of introducing 3-D CTC as a first-line investigation are unknown. AIM: The aim of this study was to assess the cost effectiveness of 3-D CTC versus OC for colonic imaging of symptomatic gastroenterology patients in the UK NHS. METHODS: We used a Markov model to follow a cohort of 100,000 symptomatic gastroenterology patients, aged 50 years or older, and estimate the expected lifetime outcomes, life years (LYs) and quality-adjusted life years (QALYs), and costs (£, 2010-2011) associated with 3-D CTC and OC. Sensitivity analyses were performed to assess the robustness of the base-case cost-effectiveness results to variation in input parameters and methodological assumptions. RESULTS: 3D-CTC provided a similar number of LYs (7.737 vs 7.739) and QALYs (7.013 vs 7.018) per individual compared with OC, and it was associated with substantially lower mean costs per patient (£467 vs £583), leading to a positive incremental net benefit. After accounting for the overall uncertainty, the probability of 3-D CTC being cost effective was around 60 %, at typical willingness-to-pay values of £20,000-£30,000 per QALY gained. CONCLUSION: 3-D CTC is a cost-saving and cost-effective option for colonic imaging of symptomatic gastroenterology patients compared with OC
    corecore