498 research outputs found
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
On distributed ledger technology for the internet of things: design and applications
Distributed ledger technology (DLT) can used to store information in such a way that no individual or organisation can compromise its veracity, contrary to a traditional centralised ledger. This nascent technology has received a great deal of attention from both researchers and practitioners in recent years due to the vast array of open questions related to its design and the assortment novel applications it unlocks. In this thesis, we are especially interested in the design of DLTs suitable for application in the domain of the internet of things (IoT), where factors such as efficiency, performance and scalability are of paramount importance. This work confronts the challenges of designing IoT-oriented distributed ledgers through analysis of ledger properties, development of design tools and the design of a number of core protocol components. We begin by introducing a class of DLTs whose data structures consist of directed acyclic graphs (DAGs) and which possess properties that make them particularly well suited to IoT applications. With a focus on the DAG structure, we then present analysis through mathematical modelling and simulations which provides new insights to the properties of this class of ledgers and allows us to propose novel security enhancements. Next, we shift our focus away from the DAG structure itself to another open problem for DAG-based distributed ledgers, that of access control. Specifically, we present a networking approach which removes the need for an expensive and inefficient mechanism known as Proof of Work, solving an open problem for IoT-oriented distributed ledgers. We then draw upon our analysis of the DAG structure to integrate and test our new access control with other core components of the DLT. Finally, we present a mechanism for orchestrating the interaction between users of a DLT and its operators, seeking to improves the usability of DLTs for IoT applications. In the appendix, we present two projects also carried out during this PhD which showcase applications of this technology in the IoT domain.Open Acces
- …