2,103 research outputs found
Synthesizing benchmarks for predictive modeling
Predictive modeling using machine learning is an effective method for building compiler heuristics, but there is a shortage of benchmarks. Typical machine learning experiments outside of the compilation field train over thousands or millions of examples. In machine learning for compilers, however, there are typically only a few dozen common benchmarks available. This limits the quality of learned models, as they have very sparse training data for what are often high-dimensional feature spaces. What is needed is a way to generate an unbounded number of training programs that finely cover the feature space. At the same time the generated programs must be similar to the types of programs that human developers actually write, otherwise the learning will target the wrong parts of the feature space.
We mine open source repositories for program fragments and apply deep learning techniques to automatically construct models for how humans write programs. We sample these models to generate an unbounded number of runnable training programs. The quality of the programs is such that even human developers struggle to distinguish our generated programs from hand-written code.
We use our generator for OpenCL programs, CLgen, to automatically synthesize thousands of programs and show that learning over these improves the performance of a state of the art predictive model by 1.27Ă—. In addition, the fine covering of the feature space automatically exposes weaknesses in the feature design which are invisible with the sparse training examples from existing benchmark suites. Correcting these weaknesses further increases performance by 4.30Ă—
Adaptive optimization for OpenCL programs on embedded heterogeneous systems
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to map OpenCL kernels onto heterogeneous multi-cores for a given optimization criterion – whether it is faster runtime, lower energy consumption or a trade-off between them. This is achieved by developing a machine learning based approach to predict which processor to use to run the OpenCL kernel and the host program, and at what frequency the processor should operate. Instead of hand-tuning a model for each optimization metric, we use machine learning to develop a unified framework that first automatically learns the optimization heuristic for each metric off-line, then uses the learned knowledge to schedule OpenCL kernels at runtime based on code and runtime information of the program. We apply our approach to a set of representative OpenCL benchmarks and evaluate it on an ARM big.LITTLE mobile platform. Our approach achieves over 93% of the performance delivered by a perfect predictor.We obtain, on average, 1.2x, 1.6x, and 1.8x improvement respectively for runtime, energy consumption and the energy delay product when compared to a comparative heterogeneous-aware OpenCL task mapping scheme
Correlated Resource Models of Internet End Hosts
Understanding and modelling resources of Internet end hosts is essential for
the design of desktop software and Internet-distributed applications. In this
paper we develop a correlated resource model of Internet end hosts based on
real trace data taken from the SETI@home project. This data covers a 5-year
period with statistics for 2.7 million hosts. The resource model is based on
statistical analysis of host computational power, memory, and storage as well
as how these resources change over time and the correlations between them. We
find that resources with few discrete values (core count, memory) are well
modeled by exponential laws governing the change of relative resource
quantities over time. Resources with a continuous range of values are well
modeled with either correlated normal distributions (processor speed for
integer operations and floating point operations) or log-normal distributions
(available disk space). We validate and show the utility of the models by
applying them to a resource allocation problem for Internet-distributed
applications, and demonstrate their value over other models. We also make our
trace data and tool for automatically generating realistic Internet end hosts
publicly available
Approximate Inference in Continuous Determinantal Point Processes
Determinantal point processes (DPPs) are random point processes well-suited
for modeling repulsion. In machine learning, the focus of DPP-based models has
been on diverse subset selection from a discrete and finite base set. This
discrete setting admits an efficient sampling algorithm based on the
eigendecomposition of the defining kernel matrix. Recently, there has been
growing interest in using DPPs defined on continuous spaces. While the
discrete-DPP sampler extends formally to the continuous case, computationally,
the steps required are not tractable in general. In this paper, we present two
efficient DPP sampling schemes that apply to a wide range of kernel functions:
one based on low rank approximations via Nystrom and random Fourier feature
techniques and another based on Gibbs sampling. We demonstrate the utility of
continuous DPPs in repulsive mixture modeling and synthesizing human poses
spanning activity spaces
2016 International Land Model Benchmarking (ILAMB) Workshop Report
As earth system models (ESMs) become increasingly complex, there is a growing need for comprehensive and multi-faceted evaluation of model projections. To advance understanding of terrestrial biogeochemical processes and their interactions with hydrology and climate under conditions of increasing atmospheric carbon dioxide, new analysis methods are required that use observations to constrain model predictions, inform model development, and identify needed measurements and field experiments. Better representations of biogeochemistryclimate feedbacks and ecosystem processes in these models are essential for reducing the acknowledged substantial uncertainties in 21st century climate change projections
Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels
Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.DFG, 360291326, CELERITY: Innovative Modellierung für Skalierbare Verteilte Laufzeitsystem
- …