77,246 research outputs found

    Resilient Optimistic Termination Detection for the Async-Finish Model

    Get PDF
    International audienceDriven by increasing core count and decreasing mean-time-to-failure in supercomputers, HPC runtime systems must improve support for dynamic task-parallel execution and resilience to failures. The async-finish task model, adapted for distributed systems as the asynchronous partitioned global address space programming model, provides a simple way to decompose a computation into nested task groups, each managed by a ‘finish’ that signals the termination of all tasks within the group.For distributed termination detection, maintaining a consistent view of task state across multiple unreliable processes requires additional book-keeping when creating or completing tasks and finish-scopes. Runtime systems which perform this book-keeping pessimistically, i.e. synchronously with task state changes, add a high communication overhead compared to non-resilient protocols. In this paper, we propose optimistic finish, the first message-optimal resilient termination detection protocol for the async-finish model. By avoiding the communication of certain task and finish events, this protocol allows uncertainty about the global structure of the computation which can be resolved correctly at failure time, thereby reducing the overhead for failure-free execution.Performance results using micro-benchmarks and the LULESH hydrodynamics proxy application show significant reductions in resilience overhead with optimistic finish compared to pessimistic finish. Our optimistic finish protocol is applicable to any task-based runtime system offering automatic termination detection for dynamic graphs of non-migratable tasks

    Shortest, Fastest, and Foremost Broadcast in Dynamic Networks

    Full text link
    Highly dynamic networks rarely offer end-to-end connectivity at a given time. Yet, connectivity in these networks can be established over time and space, based on temporal analogues of multi-hop paths (also called {\em journeys}). Attempting to optimize the selection of the journeys in these networks naturally leads to the study of three cases: shortest (minimum hop), fastest (minimum duration), and foremost (earliest arrival) journeys. Efficient centralized algorithms exists to compute all cases, when the full knowledge of the network evolution is given. In this paper, we study the {\em distributed} counterparts of these problems, i.e. shortest, fastest, and foremost broadcast with termination detection (TDB), with minimal knowledge on the topology. We show that the feasibility of each of these problems requires distinct features on the evolution, through identifying three classes of dynamic graphs wherein the problems become gradually feasible: graphs in which the re-appearance of edges is {\em recurrent} (class R), {\em bounded-recurrent} (B), or {\em periodic} (P), together with specific knowledge that are respectively nn (the number of nodes), Δ\Delta (a bound on the recurrence time), and pp (the period). In these classes it is not required that all pairs of nodes get in contact -- only that the overall {\em footprint} of the graph is connected over time. Our results, together with the strict inclusion between PP, BB, and RR, implies a feasibility order among the three variants of the problem, i.e. TDB[foremost] requires weaker assumptions on the topology dynamics than TDB[shortest], which itself requires less than TDB[fastest]. Reversely, these differences in feasibility imply that the computational powers of RnR_n, BΔB_\Delta, and PpP_p also form a strict hierarchy

    In-Network Outlier Detection in Wireless Sensor Networks

    Full text link
    To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy usage,(3) only uses single hop communication thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance using simulation with real sensor data streams. Our results demonstrate that our approach is accurate and imposes a reasonable communication load and level of power consumption.Comment: Extended version of a paper appearing in the Int'l Conference on Distributed Computing Systems 200

    Gathering in Dynamic Rings

    Full text link
    The gathering problem requires a set of mobile agents, arbitrarily positioned at different nodes of a network to group within finite time at the same location, not fixed in advanced. The extensive existing literature on this problem shares the same fundamental assumption: the topological structure does not change during the rendezvous or the gathering; this is true also for those investigations that consider faulty nodes. In other words, they only consider static graphs. In this paper we start the investigation of gathering in dynamic graphs, that is networks where the topology changes continuously and at unpredictable locations. We study the feasibility of gathering mobile agents, identical and without explicit communication capabilities, in a dynamic ring of anonymous nodes; the class of dynamics we consider is the classic 1-interval-connectivity. We focus on the impact that factors such as chirality (i.e., a common sense of orientation) and cross detection (i.e., the ability to detect, when traversing an edge, whether some agent is traversing it in the other direction), have on the solvability of the problem. We provide a complete characterization of the classes of initial configurations from which the gathering problem is solvable in presence and in absence of cross detection and of chirality. The feasibility results of the characterization are all constructive: we provide distributed algorithms that allow the agents to gather. In particular, the protocols for gathering with cross detection are time optimal. We also show that cross detection is a powerful computational element. We prove that, without chirality, knowledge of the ring size is strictly more powerful than knowledge of the number of agents; on the other hand, with chirality, knowledge of n can be substituted by knowledge of k, yielding the same classes of feasible initial configurations
    corecore