Search CORE

2 research outputs found

Integrating stream parallelism and task parallelism in a dataflow programming model

Author: Sbirlea Dragos Dumitru
Publication venue
Publication date: 01/01/2012
Field of study

As multicore computing becomes the norm, exploiting parallelism in applications becomes a requirement for all software. Many applications exhibit different kinds of parallelism, but most parallel programming languages are biased towards a specific paradigm, of which two common ones are task and streaming parallelism. This results in a dilemma for programmers who would prefer to use the same language to exploit different paradigms for different applications. Our thesis is an integration of stream-parallel and task-parallel paradigms can be achieved in a single language with high programmability and high resource efficiency, when a general dataflow programming model is used as the foundation. The dataflow model used in this thesis is Intel's Concurrent Collections (CnC). While CnC is general enough to express both task-parallel and stream-parallel paradigms, all current implementations of CnC use task-based runtime systems that do not deliver the resource efficiency expected from stream-parallel programs. For streaming programs, this use of a task-based runtime system is wasteful of computing cycles and makes memory management more difficult than it needs to be. We propose Streaming Concurrent Collections (SCnC), a streaming system that can execute a subset of applications supported by Concurrent Collections, a general macro data-flow coordination language. Integration of streaming and task models allows application developers to benefit from the efficiency of stream parallelism as well as the generality of task parallelism, all in the context of an easy-to-use and general dataflow programming model. To achieve this integration, we formally define streaming access patterns that, if respected, allow CnC task based applications to be executed using the streaming model. We specify conditions under which an application can run safely, meaning with identical result and without deadlocks using the streaming runtime. A static analysis that verifies if an application respects these patterns is proposed and we describe algorithmic transformations to bring a larger set of CnC applications to a form that can be run using the streaming runtime. To take advantage of dynamic parallelism opportunities inside streaming applications, we propose a simple tuning annotation for streaming applications, that have traditionally been considered with fixed parallelism. Our dynamic parallelism construct, the dynamic splitter, which allows fission of stateful filters with little guidance from the programmer is based on the idea of different places where computations are distributed. Finally, performance results show that transitioning from the task parallel runtime to streaming runtime leads to a throughput increase of up to 40×. In summary, this thesis shows that stream-parallel and task-parallel paradigms can be integrated in a single language when a dataflow model is used as the foundation, and that this integration can be achieved with high programmability and high resource efficiency. Integration of these models allows application developers to benefit from the efficiency of stream parallelism as well as the generality of task parallelism, all in the context of an easy-to-use dataflow programming model

DSpace at Rice University

Efficient Evaluation of Data-intensive Batch-queries in Open Simulation Laboratories

Author: Kanov Kalin Nikolaev
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 15/12/2016
Field of study

Better instruments, faster and bigger supercomputers and easier collaboration and sharing of data in the sciences have introduced the need to manage increasingly large datasets. Advances in high-performance computing (HPC) have empowered many science disciplines' computational branches. However, many scientists lack access to HPC facilities or the necessary sophistication to develop and run HPC codes. The benefits of testing new theories and experimenting with large numerical simulations have thus been restricted to a few top users. In this dissertation, I describe the ``remote immersive analysis" approach to computational science and present new techniques and methods for the efficient evaluation of scientific analysis tasks in analysis cluster environments. I will discuss several techniques developed for the efficient evaluation of data-intensive batch-queries in large numerical simulation databases. An I/O streaming method for the evaluation of decomposable kernel computations utilizes partial-sums to evaluate a batch query by performing a single sequential pass over the data. Spatial filtering computations, which use a box filter, share not only data, but also computation and can be evaluated over an intermediate summed volumes dataset derived from the original data. This is more efficient for certain workloads even when the intermediate dataset is computed dynamically. Threshold queries have immense data requirements and potentially operate over entire time-steps of the simulation. An efficient and scalable data-parallel approach evaluates threshold queries of fields derived from the raw simulation data and stores their results in an application-aware semantic cache for fast subsequent retrieval. Finally, synchronization at a mediator, task parallel and data-parallel approaches for the evaluation of particle tracking queries are compared and examined. These techniques are developed, deployed and evaluated in the Johns Hopkins Turbulence Databases (JHTDB), an open simulation laboratory for turbulence research. The JHTDB stores the output of world-class numerical simulations of turbulence and provides public access to and means to explore their complete space-time history. The techniques discussed implement core scientific analysis routines and significantly increase the utility of the service. Additionally, they improve the performance of these routines by up-to an order of magnitude or more when compared with direct implementations or implementations adapted from the simulation code

JScholarship