546 research outputs found

    Methods and design issues for next generation network-aware applications

    Get PDF
    Networks are becoming an essential component of modern cyberinfrastructure and this work describes methods of designing distributed applications for high-speed networks to improve application scalability, performance and capabilities. As the amount of data generated by scientific applications continues to grow, to be able to handle and process it, applications should be designed to use parallel, distributed resources and high-speed networks. For scalable application design developers should move away from the current component-based approach and implement instead an integrated, non-layered architecture where applications can use specialized low-level interfaces. The main focus of this research is on interactive, collaborative visualization of large datasets. This work describes how a visualization application can be improved through using distributed resources and high-speed network links to interactively visualize tens of gigabytes of data and handle terabyte datasets while maintaining high quality. The application supports interactive frame rates, high resolution, collaborative visualization and sustains remote I/O bandwidths of several Gbps (up to 30 times faster than local I/O). Motivated by the distributed visualization application, this work also researches remote data access systems. Because wide-area networks may have a high latency, the remote I/O system uses an architecture that effectively hides latency. Five remote data access architectures are analyzed and the results show that an architecture that combines bulk and pipeline processing is the best solution for high-throughput remote data access. The resulting system, also supporting high-speed transport protocols and configurable remote operations, is up to 400 times faster than a comparable existing remote data access system. Transport protocols are compared to understand which protocol can best utilize high-speed network connections, concluding that a rate-based protocol is the best solution, being 8 times faster than standard TCP. An HD-based remote teaching application experiment is conducted, illustrating the potential of network-aware applications in a production environment. Future research areas are presented, with emphasis on network-aware optimization, execution and deployment scenarios

    NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion

    Full text link
    We develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred between processors in a decentralized fashion. As a consequence it is a lock-free parallel algorithm. In spite of being an asynchronous algorithm, the variable updates of NOMAD are serializable, that is, there is an equivalent update ordering in a serial implementation. NOMAD outperforms synchronous algorithms which require explicit bulk synchronization after every iteration: our extensive empirical evaluation shows that not only does our algorithm perform well in distributed setting on commodity hardware, but also outperforms state-of-the-art algorithms on a HPC cluster both in multi-core and distributed memory settings

    Divide-and-Conquer Distributed Learning: Privacy-Preserving Offloading of Neural Network Computations

    Get PDF
    Machine learning has become a highly utilized technology to perform decision making on high dimensional data. As dataset sizes have become increasingly large so too have the neural networks to learn the complex patterns hidden within. This expansion has continued to the degree that it may be infeasible to train a model from a singular device due to computational or memory limitations of underlying hardware. Purpose built computing clusters for training large models are commonplace while access to networks of heterogeneous devices is still typically more accessible. In addition, with the rise of 5G networks, computation at the edge becoming more commonplace, and inspired by the successes of the folding@home project utilizing crowdsourced computation, we consider the scenario of the crowdsourcing the computation required for training of a neural network particularly appealing. Distributed learning promises to bridge the widening gap between singular device performance and large-scale model computational requirements, but unfortunately, current distributed learning techniques do not maintain privacy of both the model and input with- out an accuracy or computational tradeoff. In response, we present Divide and Conquer Learning (DCL), an innovative approach that enables quantifiable privacy guarantees while offloading the computational burden of training to a network of devices. A user can divide the training computation of its neural network into neuron-sized computation tasks and dis- tribute them to devices based on their available resources. The results will be returned to the user and aggregated in an iterative process to obtain the final neural network model. To protect the privacy of the user’s data and model, shuffling is done to both the data and the neural network model before the computation task is distributed to devices. Our strict adherence to the order of operations allows a user to verify the correctness of performed computations through assigning a task to multiple devices and cross-validating their results. This can protect against network churns and detect faulty or misbehaving devices
    • …
    corecore