27,032 research outputs found
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach
The definition of a concise and effective testbed for Genetic Programming
(GP) is a recurrent matter in the research community. This paper takes a new
step in this direction, proposing a different approach to measure the quality
of the symbolic regression benchmarks quantitatively. The proposed approach is
based on meta-learning and uses a set of dataset meta-features---such as the
number of examples or output skewness---to describe the datasets. Our idea is
to correlate these meta-features with the errors obtained by a GP method. These
meta-features define a space of benchmarks that should, ideally, have datasets
(points) covering different regions of the space. An initial analysis of 63
datasets showed that current benchmarks are concentrated in a small region of
this benchmark space. We also found out that number of instances and output
skewness are the most relevant meta-features to GP output error. Both
conclusions can help define which datasets should compose an effective testbed
for symbolic regression methods.Comment: 8 pages, 3 Figures, Proceedings of Genetic and Evolutionary
Computation Conference Companion, Kyoto, Japa
Recommended from our members
On evolution of relatively large combinational logic circuits
Evolvable hardware (EHW) (Yao and Higuchi, 1999) is a technique introduced to automatically design circuits where the circuit configuration is carried out by evolutionary algorithms. One of the main difficulties in using EHW to solve real-world problems is the scalability. Until now, several strategies have been proposed to avoid this problem, but none of them completely tackle the issue. In this paper three different methods for evolving the most complex circuits have been tested for their scalability. These methods are bi-directional incremental evolution (SO-BIE); generalised disjunction decomposition (GD-BIE) and evolutionary strategies (ES) with dynamic mutation rate. In order to achieve the generalised conclusions the chosen approaches were tested using multipliers, traditionally used in EHW, but also logic circuits taken from MCNC (Yang, 1991) benchmark library and randomly generated circuits. The analysis of the approaches demonstrated that PLA-based ES is capable of evolving logic circuits of up to 12 inputs. The use of SO-BIE allows the generation of fully functional circuits of 14 inputs and GD-BIE is estimated to be able to evolve circuits of 21 inputs
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML
Combining Static and Dynamic Analysis for Vulnerability Detection
In this paper, we present a hybrid approach for buffer overflow detection in
C code. The approach makes use of static and dynamic analysis of the
application under investigation. The static part consists in calculating taint
dependency sequences (TDS) between user controlled inputs and vulnerable
statements. This process is akin to program slice of interest to calculate
tainted data- and control-flow path which exhibits the dependence between
tainted program inputs and vulnerable statements in the code. The dynamic part
consists of executing the program along TDSs to trigger the vulnerability by
generating suitable inputs. We use genetic algorithm to generate inputs. We
propose a fitness function that approximates the program behavior (control
flow) based on the frequencies of the statements along TDSs. This runtime
aspect makes the approach faster and accurate. We provide experimental results
on the Verisec benchmark to validate our approach.Comment: There are 15 pages with 1 figur
- …