37 research outputs found
Warmstarting of Model-based Algorithm Configuration
The performance of many hard combinatorial problem solvers depends strongly
on their parameter settings, and since manual parameter tuning is both tedious
and suboptimal the AI community has recently developed several algorithm
configuration (AC) methods to automatically address this problem. While all
existing AC methods start the configuration process of an algorithm A from
scratch for each new type of benchmark instances, here we propose to exploit
information about A's performance on previous benchmarks in order to warmstart
its configuration on new types of benchmarks. We introduce two complementary
ways in which we can exploit this information to warmstart AC methods based on
a predictive model. Experiments for optimizing a very flexible modern SAT
solver on twelve different instance sets show that our methods often yield
substantial speedups over existing AC methods (up to 165-fold) and can also
find substantially better configurations given the same compute budget.Comment: Preprint of AAAI'18 pape
Predicting Good Configurations for GitHub and Stack Overflow Topic Models
Software repositories contain large amounts of textual data, ranging from
source code comments and issue descriptions to questions, answers, and comments
on Stack Overflow. To make sense of this textual data, topic modelling is
frequently used as a text-mining tool for the discovery of hidden semantic
structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used
topic model that aims to explain the structure of a corpus by grouping texts.
LDA requires multiple parameters to work well, and there are only rough and
sometimes conflicting guidelines available on how these parameters should be
set. In this paper, we contribute (i) a broad study of parameters to arrive at
good local optima for GitHub and Stack Overflow text corpora, (ii) an
a-posteriori characterisation of text corpora related to eight programming
languages, and (iii) an analysis of corpus feature importance via per-corpus
LDA configuration. We find that (1) popular rules of thumb for topic modelling
parameter configuration are not applicable to the corpora used in our
experiments, (2) corpora sampled from GitHub and Stack Overflow have different
characteristics and require different configurations to achieve good model fit,
and (3) we can predict good configurations for unseen corpora reliably. These
findings support researchers and practitioners in efficiently determining
suitable configurations for topic modelling when analysing textual data
contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International
Conference on Mining Software Repositorie