4,906 research outputs found
Simple and Effective Type Check Removal through Lazy Basic Block Versioning
Dynamically typed programming languages such as JavaScript and Python defer
type checking to run time. In order to maximize performance, dynamic language
VM implementations must attempt to eliminate redundant dynamic type checks.
However, type inference analyses are often costly and involve tradeoffs between
compilation time and resulting precision. This has lead to the creation of
increasingly complex multi-tiered VM architectures.
This paper introduces lazy basic block versioning, a simple JIT compilation
technique which effectively removes redundant type checks from critical code
paths. This novel approach lazily generates type-specialized versions of basic
blocks on-the-fly while propagating context-dependent type information. This
does not require the use of costly program analyses, is not restricted by the
precision limitations of traditional type analyses and avoids the
implementation complexity of speculative optimization techniques.
We have implemented intraprocedural lazy basic block versioning in a
JavaScript JIT compiler. This approach is compared with a classical flow-based
type analysis. Lazy basic block versioning performs as well or better on all
benchmarks. On average, 71% of type tests are eliminated, yielding speedups of
up to 50%. We also show that our implementation generates more efficient
machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on
several benchmarks. The combination of implementation simplicity, low
algorithmic complexity and good run time performance makes basic block
versioning attractive for baseline JIT compilers
Enlarging instruction streams
The stream fetch engine is a high-performance fetch architecture based on the concept of an instruction stream. We call a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks, a stream. The long length of instruction streams makes it possible for the stream fetch engine to provide a high fetch bandwidth and to hide the branch predictor access latency, leading to performance results close to a trace cache at a lower implementation cost and complexity. Therefore, enlarging instruction streams is an excellent way to improve the stream fetch engine. In this paper, we present several hardware and software mechanisms focused on enlarging those streams that finalize at particular branch types. However, our results point out that focusing on particular branch types is not a good strategy due to Amdahl's law. Consequently, we propose the multiple-stream predictor, a novel mechanism that deals with all branch types by combining single streams into long virtual streams. This proposal tolerates the prediction table access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding. Moreover, it provides high-performance results which are comparable to state-of-the-art fetch architectures but with a simpler design that consumes less energy.Peer ReviewedPostprint (published version
Symbolic QED Pre-silicon Verification for Automotive Microcontroller Cores: Industrial Case Study
We present an industrial case study that demonstrates the practicality and
effectiveness of Symbolic Quick Error Detection (Symbolic QED) in detecting
logic design flaws (logic bugs) during pre-silicon verification. Our study
focuses on several microcontroller core designs (~1,800 flip-flops, ~70,000
logic gates) that have been extensively verified using an industrial
verification flow and used for various commercial automotive products. The
results of our study are as follows: 1. Symbolic QED detected all logic bugs in
the designs that were detected by the industrial verification flow (which
includes various flavors of simulation-based verification and formal
verification). 2. Symbolic QED detected additional logic bugs that were not
recorded as detected by the industrial verification flow. (These additional
bugs were also perhaps detected by the industrial verification flow.) 3.
Symbolic QED enables significant design productivity improvements: (a) 8X
improved (i.e., reduced) verification effort for a new design (8 person-weeks
for Symbolic QED vs. 17 person-months using the industrial verification flow).
(b) 60X improved verification effort for subsequent designs (2 person-days for
Symbolic QED vs. 4-7 person-months using the industrial verification flow). (c)
Quick bug detection (runtime of 20 seconds or less), together with short
counterexamples (10 or fewer instructions) for quick debug, using Symbolic QED
Interprocedural Type Specialization of JavaScript Programs Without Type Analysis
Dynamically typed programming languages such as Python and JavaScript defer
type checking to run time. VM implementations can improve performance by
eliminating redundant dynamic type checks. However, type inference analyses are
often costly and involve tradeoffs between compilation time and resulting
precision. This has lead to the creation of increasingly complex multi-tiered
VM architectures.
Lazy basic block versioning is a simple JIT compilation technique which
effectively removes redundant type checks from critical code paths. This novel
approach lazily generates type-specialized versions of basic blocks on-the-fly
while propagating context-dependent type information. This approach does not
require the use of costly program analyses, is not restricted by the precision
limitations of traditional type analyses.
This paper extends lazy basic block versioning to propagate type information
interprocedurally, across function call boundaries. Our implementation in a
JavaScript JIT compiler shows that across 26 benchmarks, interprocedural basic
block versioning eliminates more type tag tests on average than what is
achievable with static type analysis without resorting to code transformations.
On average, 94.3% of type tag tests are eliminated, yielding speedups of up to
56%. We also show that our implementation is able to outperform Truffle/JS on
several benchmarks, both in terms of execution time and compilation time.Comment: 10 pages, 10 figures, submitted to CGO 201
Improvement of Data-Intensive Applications Running on Cloud Computing Clusters
MapReduce, designed by Google, is widely used as the most popular distributed programming model in cloud environments. Hadoop, an open-source implementation of MapReduce, is a data management framework on large cluster of commodity machines to handle data-intensive applications. Many famous enterprises including Facebook, Twitter, and Adobe have been using Hadoop for their data-intensive processing needs. Task stragglers in MapReduce jobs dramatically impede job execution on massive datasets in cloud computing systems. This impedance is due to the uneven distribution of input data and computation load among cluster nodes, heterogeneous data nodes, data skew in reduce phase, resource contention situations, and network configurations. All these reasons may cause delay failure and the violation of job completion time. One of the key issues that can significantly affect the performance of cloud computing is the computation load balancing among cluster nodes. Replica placement in Hadoop distributed file system plays a significant role in data availability and the balanced utilization of clusters. In the current replica placement policy (RPP) of Hadoop distributed file system (HDFS), the replicas of data blocks cannot be evenly distributed across cluster\u27s nodes. The current HDFS must rely on a load balancing utility for balancing the distribution of replicas, which results in extra overhead for time and resources. This dissertation addresses data load balancing problem and presents an innovative replica placement policy for HDFS. It can perfectly balance the data load among cluster\u27s nodes. The heterogeneity of cluster nodes exacerbates the issue of computational load balancing; therefore, another replica placement algorithm has been proposed in this dissertation for heterogeneous cluster environments. The timing of identifying the straggler map task is very important for straggler mitigation in data-intensive cloud computing. To mitigate the straggler map task, Present progress and Feedback based Speculative Execution (PFSE) algorithm has been proposed in this dissertation. PFSE is a new straggler identification scheme to identify the straggler map tasks based on the feedback information received from completed tasks beside the progress of the current running task. Straggler reduce task aggravates the violation of MapReduce job completion time. Straggler reduce task is typically the result of bad data partitioning during the reduce phase. The Hash partitioner employed by Hadoop may cause intermediate data skew, which results in straggler reduce task. In this dissertation a new partitioning scheme, named Balanced Data Clusters Partitioner (BDCP), is proposed to mitigate straggler reduce tasks. BDCP is based on sampling of input data and feedback information about the current processing task. BDCP can assist in straggler mitigation during the reduce phase and minimize the job completion time in MapReduce jobs. The results of extensive experiments corroborate that the algorithms and policies proposed in this dissertation can improve the performance of data-intensive applications running on cloud platforms
- …