240,475 research outputs found

    Object-oriented Tools for Distributed Computing

    Get PDF
    Distributed computing systems are proliferating, owing to the availability of powerful, affordable microcomputers and inexpensive communication networks. A critical problem in developing such systems is getting application programs to interact with one another across a computer network. Remote interprogram connectivity is particularly challenging across heterogeneous environments, where applications run on different kinds of computers and operating systems. NetWorks! (trademark) is an innovative software product that provides an object-oriented messaging solution to these problems. This paper describes the design and functionality of NetWorks! and illustrates how it is being used to build complex distributed applications for NASA and in the commercial sector

    Dynamic Control Flow in Large-Scale Machine Learning

    Full text link
    Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. For performance, scalability, and expressiveness, a machine learning system must support dynamic control flow in distributed and heterogeneous environments. This paper presents a programming model for distributed machine learning that supports dynamic control flow. We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. Our approach extends the use of dataflow graphs to represent machine learning models, offering several distinctive features. First, the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs. Second, programs written in our model support automatic differentiation and distributed gradient computations, which are necessary for training machine learning models that use control flow. Third, our choice of non-strict semantics enables multiple loop iterations to execute in parallel across machines, and to overlap compute and I/O operations. We have done our work in the context of TensorFlow, and it has been used extensively in research and production. We evaluate it using several real-world applications, and demonstrate its performance and scalability.Comment: Appeared in EuroSys 2018. 14 pages, 16 figure

    Limits and dynamics of stochastic neuronal networks with random heterogeneous delays

    Full text link
    Realistic networks display heterogeneous transmission delays. We analyze here the limits of large stochastic multi-populations networks with stochastic coupling and random interconnection delays. We show that depending on the nature of the delays distributions, a quenched or averaged propagation of chaos takes place in these networks, and that the network equations converge towards a delayed McKean-Vlasov equation with distributed delays. Our approach is mostly fitted to neuroscience applications. We instantiate in particular a classical neuronal model, the Wilson and Cowan system, and show that the obtained limit equations have Gaussian solutions whose mean and standard deviation satisfy a closed set of coupled delay differential equations in which the distribution of delays and the noise levels appear as parameters. This allows to uncover precisely the effects of noise, delays and coupling on the dynamics of such heterogeneous networks, in particular their role in the emergence of synchronized oscillations. We show in several examples that not only the averaged delay, but also the dispersion, govern the dynamics of such networks.Comment: Corrected misprint (useless stopping time) in proof of Lemma 1 and clarified a regularity hypothesis (remark 1

    TensorFlow Doing HPC

    Full text link
    TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.Comment: Accepted for publication at The Ninth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES'19

    A component-based middleware framework for configurable and reconfigurable Grid computing

    Get PDF
    Significant progress has been made in the design and development of Grid middleware which, in its present form, is founded on Web services technologies. However, we argue that present-day Grid middleware is severely limited in supporting projected next-generation applications which will involve pervasive and heterogeneous networked infrastructures, and advanced services such as collaborative distributed visualization. In this paper we discuss a new Grid middleware framework that features (i) support for advanced network services based on the novel concept of pluggable overlay networks, (ii) an architectural framework for constructing bespoke Grid middleware platforms in terms of 'middleware domains' such as extensible interaction types and resource discovery. We believe that such features will become increasingly essential with the emergence of next-generation e-Science applications. Copyright (c) 2005 John Wiley & Sons, Ltd

    Symbiot: Congestion-driven Multi-resource Fairness for Multi-User Sensor Networks

    Get PDF
    © 2015 IEEE.In this paper, we study the problem of multi-resource fairness in multi-user sensor networks with heterogeneous and time-varying resources. Particularly we focus on data gathering applications run on Wireless Sensor Networks (WSNs) or Internet of Things (IoT) in which users require to run a serious of sensing operations with various resource requirements. We consider both the resource demands of sensing tasks, and data forwarding tasks needed to establish multi-hop relay communications. By exploiting graph theory, queueing theory and the notion of dominant resource shares, we develop Symbiot, a light-weight, distributed algorithm that ensures multi-resource fairness between these users. With Symbiot, nodes can independently schedule its resources while maintaining network-level resource fairness through observing traffic congestion levels. Large-scale simulations based Contiki OS and Cooja network emulator show the effectiveness of Symbiot in adaptively utilizing available resources and reducing average completion times

    Consensus-based Networked Tracking in Presence of Heterogeneous Time-Delays

    Full text link
    We propose a distributed (single) target tracking scheme based on networked estimation and consensus algorithms over static sensor networks. The tracking part is based on linear time-difference-of-arrival (TDOA) measurement proposed in our previous works. This paper, in particular, develops delay-tolerant distributed filtering solutions over sparse data-transmission networks. We assume general arbitrary heterogeneous delays at different links. This may occur in many realistic large-scale applications where the data-sharing between different nodes is subject to latency due to communication-resource constraints or large spatially distributed sensor networks. The solution we propose in this work shows improved performance (verified by both theory and simulations) in such scenarios. Another privilege of such distributed schemes is the possibility to add localized fault-detection and isolation (FDI) strategies along with survivable graph-theoretic design, which opens many follow-up venues to this research. To our best knowledge no such delay-tolerant distributed linear algorithm is given in the existing distributed tracking literature.Comment: ICRoM2
    • …
    corecore