29 research outputs found
MLGOPerf: An ML Guided Inliner to Optimize Performance
For the past 25 years, we have witnessed an extensive application of Machine
Learning to the Compiler space; the selection and the phase-ordering problem.
However, limited works have been upstreamed into the state-of-the-art
compilers, i.e., LLVM, to seamlessly integrate the former into the optimization
pipeline of a compiler to be readily deployed by the user. MLGO was among the
first of such projects and it only strives to reduce the code size of a binary
with an ML-based Inliner using Reinforcement Learning.
This paper presents MLGOPerf; the first end-to-end framework capable of
optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model
to generate rewards used for training a retargeted Reinforcement learning
agent, previously used as the primary model by MLGO. It does so by predicting
the post-inlining speedup of a function under analysis and it enables a fast
training framework for the primary model which otherwise wouldn't be practical.
The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with
respect to LLVM's optimization at O3 when trained for performance on SPEC
CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach
provides up to 26% increased opportunities to autotune code regions for our
benchmarks which can be translated into an additional 3.7% speedup value.Comment: Version 2: Added the missing Table 6. The short version of this work
is accepted at ACM/IEEE CASES 202
ACPO: AI-Enabled Compiler-Driven Program Optimization
The key to performance optimization of a program is to decide correctly when
a certain transformation should be applied by a compiler. This is an ideal
opportunity to apply machine-learning models to speed up the tuning process;
while this realization has been around since the late 90s, only recent
advancements in ML enabled a practical application of ML to compilers as an
end-to-end framework.
This paper presents ACPO: \textbf{\underline{A}}I-Enabled
\textbf{\underline{C}}ompiler-driven \textbf{\underline{P}}rogram
\textbf{\underline{O}}ptimization; a novel framework to provide LLVM with
simple and comprehensive tools to benefit from employing ML models for
different optimization passes. We first showcase the high-level view, class
hierarchy, and functionalities of ACPO and subsequently, demonstrate a couple
of use cases of ACPO by ML-enabling the Loop Unroll and Function Inlining
passes and describe how ACPO can be leveraged to optimize other passes.
Experimental results reveal that ACPO model for Loop Unroll is able to gain on
average 4\% compared to LLVM's O3 optimization when deployed on Polybench.
Furthermore, by adding the Inliner model as well, ACPO is able to provide up to
4.5\% and 2.4\% on Polybench and Cbench compared with LLVM's O3 optimization,
respectively.Comment: Preprint version of ACPO (12 pages
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to
develop a methodology to design instruction sets with concurrency support.
Therefore, we also propose a list of trade-offs that have to be investigated to
advise the design of such instruction sets.
As a first experiment, we implemented one instruction set extension for
shared memory and one for non-shared memory concurrency. From our experimental
results, we derived a list of requirements for a full-grown experimental
environment for further research
Speculative Parallelism Improves Search?
The extreme efficiency of sequential search, and the natural tendency of tree pruning systems to produce wide variations in workload, partly explains why it is proving difficult to achieve more than 30-50 % efficiency for massively parallel implementations of the; algorithm. Here we introduce typical enhanced sequential algorithms and address the major issues of parallel game-tree searching under conditions of severe pruning. It is this pruning that makes the parallelization difficult. After examining previous work on parallel; algorithms, we present a new method called Dynamic Multiple Principal Variation Splitting (DM-PVSplit) and implement it on the AP1000. In this algorithm, high performance is achieved by using some novel approaches: Parallel speculative search of candidate principal variations is used to reduce re-search delay and so obtain more quickly a better estimate of the subtree value. This is achieved by configuring a at processor arrangement as a dynamically changeable tree structure. Also, with the aid of a group-based scheduling strategy, the game tree is split dynamically at different levels. This provides better load balance and takes more advantage of parallelism. Preliminary experiments show that the scalability of the DM-PVSplit algorithm is good for massively parallel machines
Multithreaded Pruned Tree Search In Distributed Systems
Although efficient support for data-parallel applications is relatively well established, it remains open how well to support irregular and dynamic problems where there are no regular data structures and communication patterns. Tree search is central to solving a variety of problems in artificial intelligence and an important subset of the irregular applications where tasks are frequently created and terminated. In this paper, we introduce the design of a multithreaded distributed runtime system. Efficiency and ease of parallel programming are the two primary goals. In our system, multithreading is used to specify the asynchronous behavior in parallel game tree search, and dynamic load balancing is employed for efficient performance
IoP System Dependability Evaluation Method Based on AADL
The Internet of People(IoP)is characterized by the complex architecture and massive changing data, which adds to the difficulty of the analysis on IoP-based system dependability.Currently, there is still no robust dependability modelling and analysis method for IoP systems. This paper proposes an Architecture Analysis and Design Language (AADL)-based dependability evaluation method for IoP systems. By using AADL and its annex language, the dependability of IoP systems is modeled to support the qualitative analysis on the causes of system failures and risks. Furthermore, by combining the Ocarina model transformation technology, a quantitative evaluation algorithm based on the Continuous-Time Markov Chain(CTMC)is proposed. The algorithm transforms the AADL dependability model to the CTMC model, so that the dynamic and real-time attributes of IoP systems can be evaluated quantitatively. On this basis, a general IoP system model is designed to demonstrate the feasibility of the proposed method. The experimental results show that the proposed method can be used to model the IoP systems, and perform dependability analysis automatically and accurately, displaying a high application value