2 research outputs found

    Rapid Design-Space Exploration for Low-Power Manycores under Process Variation utilizing Machine Learning

    No full text
    Design-space exploration for low-power manycore design is a daunting and time-consuming task which requires some complex tools and frameworks to achieve. In the presence of process variation, the problem becomes even more challenging, especially the time associated with trial-and-error selection of the proper options in the tools to obtain the optimal power dissipation. The key contribution of this work is the novel use of machine learning to speed up the design process by embedding the tool expertise needed for low power design-space exploration for manycores into a trained neural network. To enable this, we first generate a large volume of data for 36000 benchmark applications by running them under all possible configurations to find the optimal one in terms of power. This is done using our own tool called LVSiM, a holistic manycore optimization program including process variations. A neural network is trained with this information to build in the expertise. A second contribution of this work is to define a new set of features, relevant to power and performance optimization, when training the neural network. At design time, the trained neural network is used to select the proper options on behalf of the user based on the features of any new application. However, one problem encountered with this approach is that the database constructed for machine learning has many outliers due to randomness associated with process variation which creates a major headache for classification - the supervised learning task performed by neural networks. The third key contribution of this work is a novel data coercion algorithm used as a corrective measure to handle the outliers. The proposed data coercion scheme produces results that are within 3.9% of the optimal power consumption compared to 7% without data coercion. Furthermore, the proposed method is about an order of magnitude faster than a heuristic approach and two orders of magnitude faster than a brute-force approach for design-space exploration.Computer EngineeringQuantum & Computer Engineerin

    Parallelization of variable rate decompression through metadata

    No full text
    Data movement has long been identified as the biggest challenge facing modern computer systems' designers. To tackle this challenge, many novel data compression algorithms have been developed. Often variable rate compression algorithms are favored over fixed rate. However, variable rate decompression is difficult to parallelize. Most existing algorithms adopt a single parallelization strategy suited for a particular HW platform. Such an approach fails to harness the parallelism found in diverse modern HW architectures. We propose a parallelization method for tiled variable rate compression algorithms that consists of multiple strategies that can be applied interchangeably. This allows an algorithm to apply the strategy most suitable for a specific HW platform. Our strategies are based on generating metadata during encoding, which is used to parallelize the decoding process. To demonstrate the effectiveness of our strategies, we implement them in a state-of-the-art compression algorithm called ZFP. We show that the strategies suited for multicore CPUs are different from the ones suited for GPUs. On a CPU, we achieve a near optimal decoding speedup and an overhead size which is consistently less than 0.04% of the compressed data size. On a GPU, we achieve average decoding rates of up to 100 GiB/s. Our strategies allow the user to make a trade-off between decoding throughput and metadata size overhead.Computer Engineerin
    corecore