498 research outputs found

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    On distributed ledger technology for the internet of things: design and applications

    Get PDF
    Distributed ledger technology (DLT) can used to store information in such a way that no individual or organisation can compromise its veracity, contrary to a traditional centralised ledger. This nascent technology has received a great deal of attention from both researchers and practitioners in recent years due to the vast array of open questions related to its design and the assortment novel applications it unlocks. In this thesis, we are especially interested in the design of DLTs suitable for application in the domain of the internet of things (IoT), where factors such as efficiency, performance and scalability are of paramount importance. This work confronts the challenges of designing IoT-oriented distributed ledgers through analysis of ledger properties, development of design tools and the design of a number of core protocol components. We begin by introducing a class of DLTs whose data structures consist of directed acyclic graphs (DAGs) and which possess properties that make them particularly well suited to IoT applications. With a focus on the DAG structure, we then present analysis through mathematical modelling and simulations which provides new insights to the properties of this class of ledgers and allows us to propose novel security enhancements. Next, we shift our focus away from the DAG structure itself to another open problem for DAG-based distributed ledgers, that of access control. Specifically, we present a networking approach which removes the need for an expensive and inefficient mechanism known as Proof of Work, solving an open problem for IoT-oriented distributed ledgers. We then draw upon our analysis of the DAG structure to integrate and test our new access control with other core components of the DLT. Finally, we present a mechanism for orchestrating the interaction between users of a DLT and its operators, seeking to improves the usability of DLTs for IoT applications. In the appendix, we present two projects also carried out during this PhD which showcase applications of this technology in the IoT domain.Open Acces
    • …
    corecore