Search CORE

13 research outputs found

A Graph based approach for Co-scheduling jobs on Multi-core computers

Author: He Ligang
Zhu Huanzhou
Publication venue: OASIcs - OpenAccess Series in Informatics. 2013 Imperial College Computing Student Workshop
Publication date: 01/01/2013
Field of study

In a multicore processor system, running multiple applications on different cores in the same chip could cause resource contention, which leads to performance degradation. Recent studies have shown that job co-scheduling can effectively reduce the contention. However, most existing co-schedulers do not aim to find the optimal co-scheduling solution. It is very useful to know the optimal co-scheduling performance so that the system and scheduler designers can know how much room there is for further performance improvement. Moreover, most co-schedulers only consider serial jobs, and do not take parallel jobs into account. This paper aims to tackle the above issues. In this paper, we first present a new approach to modelling the problem of co-scheduling both parallel and serial jobs. Further, a method is developed to find the optimal co-scheduling solutions. The simulation results show that compare to the method that only considers serial jobs, our developed method to co-schedule parallel jobs can improve the performance by 31% on average

Dagstuhl Research Online Publication Server

Developing graph-based co-scheduling algorithms on multicore computers

Author: He Ligang
Jarvis Stephen A.
Zhu Huanzhou
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2016
Field of study

It is common that multiple cores reside on the same chip and share the on-chip cache. As a result, resource sharing can cause performance degradation of co-running jobs.Job co-scheduling is a technique that can effectively alleviate this contention and many co-schedulers have been reported in related literature. Most solutions however do not aim to find the optimal co-scheduling solution. Being able to determine the optimal solution is critical for evaluating co-scheduling systems. Moreover, most co-schedulers only consider serial jobs, and there often exist both parallel and serial jobs in real-world systems. In this paper a graph-based method is developed to find the optimal co-scheduling solution for serial jobs; the method is then extended to incorporate parallel jobs, including multi-process, and multithreaded parallel jobs. A number of optimization measures are also developed to accelerate the solving process. Moreover, a flexible approximation technique is proposed to strike a balance between the solving speed and the solution quality. Extensive experiments are conducted to evaluate the effectiveness of the proposed co-scheduling algorithms. The results show that the proposed algorithms can find the optimal co-scheduling solution for both serial and parallel jobs. The proposed approximation technique is also shown to be flexible in the sense that we can control the solving speed by setting the requirement for the solution quality

Crossref

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

WolfGraph : the edge-centric graph processing on GPU

Author: He Ligang
Leeke Matthew
Mao Rui
Zhu Huanzhou
Publication venue: 'Elsevier BV'
Publication date: 01/10/2020
Field of study

There is the significant interest nowadays in developing the frameworks for parallelizing the processing of large graphs such as social networks, web graphs, etc. The work has been proposed to parallelize the graph processing on clusters (distributed memory), multicore machines (shared memory) and GPU devices. Most existing research on GPU-based graph processing employs the vertex-centric processing model and the Compressed Sparse Row (CSR) form to store and process a graph. However, they suffer from irregular memory access and load imbalance in GPU, which hampers the full exploitation of GPU performance. In this paper, we present WolfGraph, a GPU-based graph processing framework that addresses the above problems. WolfGraph adopts the edge-centric processing, which iterates over the edges rather than vertices. The data structure and graph partition in WolfGraph are carefully crafted so as to minimize the graph pre-processing and allow the coalesced memory access. WolfGraph fully utilizes the GPU power by processing all edges in parallel. We also develop a new method, called Concatenated Edge List (CEL), to process a graph that is bigger than the global memory of GPU. WolfGraph allows the users to define their own graph-processing methods and plug them into the WolfGraph framework. Our experiments show that WolfGraph achieves 7-8x speedup over GraphChi and X-Stream when processing large graphs, and it also offers 65% performance improvement over the existing GPU-based, vertex-centric graph processing frameworks, such as Gunrock

Warwick Research Archives Portal Repository

Developing Graph-Based Co-Scheduling Algorithms on Multicore Computers

Author: Huanzhou Zhu
Ligang He
Stephen A. Jarvis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

MSRL: Distributed Reinforcement Learning with Dataflow Fragments

Author: Chen Gang
Chen Lei
Chen Weifeng
Chen Yijie
Pietzuch Peter
Shi Liang
Yang Yaodong
Zhao Bo
Zhu Huanzhou
Publication venue
Publication date: 28/10/2022
Field of study

Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e.g. policy network updates) on GPU workers. Fundamentally, current systems lack abstractions that decouple RL algorithms from their execution. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps Python functions from an RL algorithm's training loop to parallel computational fragments. Fragments are executed on different devices by translating them to low-level dataflow representations, e.g. computational graphs as supported by deep learning engines, CUDA implementations or multi-threaded CPU processes. We show that MSRL subsumes the distribution strategies of existing systems, while scaling RL training to 64 GPUs

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Spiral - Imperial College Digital Repository

Developing graph-based co-scheduling algorithms with GPU acceleration

Author: Zhu Huanzhou
Publication venue
Publication date
Field of study

On-chip cache is often shared between processes that run concurrently on different cores of the same processor. Resource contention of this type causes the performance degradation to the co-running processes. Contention-aware co-scheduling refers to the class of scheduling techniques to reduce the performance degradation. Most existing contention-aware co-schedulers only consider serial jobs. However, there often exist both parallel and serial jobs in computing systems. This thesis aims to tackle these issues. We start with modelling the problem of co-scheduling the mix of serial and parallel jobs as an Integer Programming (IP) problem. Then we construct a co-scheduling graph to model the problem, and a set of algorithms are developed to find both optimal and near-optimal solutions. The results show that the proposed algorithms can find the optimal co-scheduling solution and that the proposed approximation technique is able to find the near optimal solutions. In order to improve the scalability of the algorithms, we use GPU to accelerate the solving process. A graph processing framework, called WolfPath, is proposed in this thesis. By taking advantage of the co-scheduling graph, WolfPath achieves significant performance improvement. Due to the long preprocessing time of WolfPath, we developed WolfGraph, a GPU-based graph processing framework that features minimal preprocessing time and uses the hard disk as a memory extension to solve large-scale graphs on a single machine equipped with a GPU device. Comparing with existing GPU-based graph processing frameworks, WolfGraph can achieve similar execution time but with minimal preprocessing time

Warwick Research Archives Portal Repository

Optimizing job scheduling on multicore computers

Author: He Ligang
Jarvis Stephen A.
Zhu Huanzhou
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2014
Field of study

Crossref

University of Birmingham Research Portal

WolfPath : accelerating iterative traversing-based graph processing algorithms on GPU

Author: Fu Songling
Fu Zhangjie
Han Xie
He Ligang
Hu Yongjian
Li Chang-Tsun
Li Rui
Zhu Huanzhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2017
Field of study

There is the significant interest nowadays in developing the frameworks of parallelizing the processing for the large graphs such as social networks, Web graphs, etc. Most parallel graph processing frameworks employ iterative processing model. However, by benchmarking the state-of-art GPU-based graph processing frameworks, we observed that the performance of iterative traversing-based graph algorithms (such as Bread First Search, Single Source Shortest Path and so on) on GPU is limited by the frequent data exchange between host and GPU. In order to tackle the problem, we develop a GPU-based graph framework called WolfPath to accelerate the processing of iterative traversing-based graph processing algorithms. In WolfPath, the iterative process is guided by the graph diameter to eliminate the frequent data exchange between host and GPU. To accomplish this goal, WolfPath proposes a data structure called Layered Edge list to represent the graph, from which the graph diameter is known before the start of graph processing. In order to enhance the applicability of our WolfPath framework, a graph preprocessing algorithm is also developed in this work to convert any graph into the format of the Layered Edge list. We conducted extensive experiments to verify the effectiveness of WolfPath. The experimental results show that WolfPath achieves significant speedup over the state-of-art GPU-based in-memory and out-of-memory graph processing frameworks

Crossref

Warwick Research Archives Portal Repository

Ras-ERK1/2 signaling contributes to the development of colorectal cancer via regulating H3K9ac

Author: Chao Zhang
Huanzhou Xue
Peng Tian
Peng Zhang
Xinyu Guo
Yanfei Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

Abstract Backgrounds/aims Ras is a control switch of ERK1/2 pathway, and hyperactivation of Ras-ERK1/2 signaling appears frequently in human cancers. However, the molecular regulation following by Ras-ERK1/2 activation is still unclear. This work aimed to reveal whether Ras-ERK1/2 promoted the development of colorectal cancer via regulating H3K9ac. Methods A vector for expression of K-Ras mutated at G12 V and T35S was transfected into SW48 cells, and the acetylation of H3K9 was measured by Western blot analysis. MTT assay, colony formation assay, transwell assay, chromatin immunoprecipitation and RT-qPCR were performed to detect whether H3K9ac was contributed to K-Ras-mediated cell growth and migration. Furthermore, whether HDAC2 and PCAF involved in modification of H3K9ac following Ras-ERK1/2 activation were studied. Results K-Ras mutated at G12 V and T35S induced a significant activation of ERK1/2 signaling and a significant down-regulation of H3K9ac. Recovering H3K9 acetylation by using a mimicked H3K9ac expression vector attenuated the promoting effects of Ras-ERK1/2 on tumor cells growth and migration. Besides, H3K9ac can be deacetylated by HDAC2 and MDM2-depedent degradation of PCAF. Conclusion H3K9ac was a specific target for Ras-ERK1/2 signaling pathway. H3K9 acetylation can be modulated by HDAC2 and MDM2-depedent degradation of PCAF. The revealed regulation provides a better understanding of Ras-ERK1/2 signaling in tumorigenesis

Directory of Open Access Journals