Search CORE

4,480 research outputs found

Asynchronous Execution of Python Code on Task Based Runtime Systems

Author: Amini Parsa
Brandt Steven
Diehl Patrick
Huck Kevin
Isaacs Kate
Kaiser Hartmut
Kheirkhahan Alireza
Serio Adrian
Shirzad Shahrzad
Tohid R.
Wagle Bibek
Williams Katy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2018
Field of study

Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards

arXiv.org e-Print Archive

Crossref

The University of Arizona

Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques

Author: Kaiser Hartmut
Khatami Zahra
Ramanujam J.
Publication venue
Publication date: 27/03/2017
Field of study

Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all of these techniques provided with a compiler are usually applied at compile time, so they rely on static analysis, which is insufficient for achieving maximum parallelism and producing desired application scalability. One solution to address this challenge is the use of runtime methods. This strategy can be implemented by delaying certain amount of code analysis to be done at runtime. In this research, we improve the parallel application performance generated by the OP2 compiler by leveraging HPX, a C++ runtime system, to provide runtime optimizations. These optimizations include asynchronous tasking, loop interleaving, dynamic chunk sizing, and data prefetching. The results of the research were evaluated using an Airfoil application which showed a 40-50% improvement in parallel performance.Comment: 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017

arXiv.org e-Print Archive

Crossref

Dynamic Control Flow in Large-Scale Machine Learning

Author: Abadi Martín
Barham Paul
Brevdo Eugene
Burrows Mike
Davis Andy
Dean Jeff
Ghemawat Sanjay
Harley Tim
Hawkins Peter
Isard Michael
Kudlur Manjunath
Monga Rajat
Murray Derek
Yu Yuan
Zheng Xiaoqiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/05/2018
Field of study

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. For performance, scalability, and expressiveness, a machine learning system must support dynamic control flow in distributed and heterogeneous environments. This paper presents a programming model for distributed machine learning that supports dynamic control flow. We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. Our approach extends the use of dataflow graphs to represent machine learning models, offering several distinctive features. First, the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs. Second, programs written in our model support automatic differentiation and distributed gradient computations, which are necessary for training machine learning models that use control flow. Third, our choice of non-strict semantics enables multiple loop iterations to execute in parallel across machines, and to overlap compute and I/O operations. We have done our work in the context of TensorFlow, and it has been used extensively in research and production. We evaluate it using several real-world applications, and demonstrate its performance and scalability.Comment: Appeared in EuroSys 2018. 14 pages, 16 figure

arXiv.org e-Print Archive

Crossref

RootJS: Node.js Bindings for ROOT 6

Author: Beffart Theo
Früh Maximilian
Haas Christoph
Rajgopal Sachin
Schwabe Jonas
Szuba Marek
Wolff Christoph
Publication venue: 'IOP Publishing'
Publication date: 28/03/2017
Field of study

We present rootJS, an interface making it possible to seamlessly integrate ROOT 6 into applications written for Node.js, the JavaScript runtime platform increasingly commonly used to create high-performance Web applications. ROOT features can be called both directly from Node.js code and by JIT-compiling C++ macros. All rootJS methods are invoked asynchronously and support callback functions, allowing non-blocking operation of Node.js applications using them. Last but not least, our bindings have been designed to platform-independent and should therefore work on all systems supporting both ROOT 6 and Node.js. Thanks to rootJS it is now possible to create ROOT-aware Web applications taking full advantage of the high performance and extensive capabilities of Node.js. Examples include platforms for the quality assurance of acquired, reconstructed or simulated data, book-keeping and e-log systems, and even Web browser-based data visualisation and analysis.Comment: 7 pages, 1 figure. To appear in the Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016

arXiv.org e-Print Archive

KITopen