21,063 research outputs found
Learning based Methods for Code Runtime Complexity Prediction
Predicting the runtime complexity of a programming code is an arduous task.
In fact, even for humans, it requires a subtle analysis and comprehensive
knowledge of algorithms to predict time complexity with high fidelity, given
any code. As per Turing's Halting problem proof, estimating code complexity is
mathematically impossible. Nevertheless, an approximate solution to such a task
can help developers to get real-time feedback for the efficiency of their code.
In this work, we model this problem as a machine learning task and check its
feasibility with thorough analysis. Due to the lack of any open source dataset
for this task, we propose our own annotated dataset CoRCoD: Code Runtime
Complexity Dataset, extracted from online judges. We establish baselines using
two different approaches: feature engineering and code embeddings, to achieve
state of the art results and compare their performances. Such solutions can be
widely useful in potential applications like automatically grading coding
assignments, IDE-integrated tools for static code analysis, and others.Comment: 14 pages, 2 figures, 8 table
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
Easy over Hard: A Case Study on Deep Learning
While deep learning is an exciting new technique, the benefits of this method
need to be assessed with respect to its computational cost. This is
particularly important for deep learning since these learners need hours (to
weeks) to train the model. Such long training time limits the ability of (a)~a
researcher to test the stability of their conclusion via repeated runs with
different random seeds; and (b)~other researchers to repeat, improve, or even
refute that original work.
For example, recently, deep learning was used to find which questions in the
Stack Overflow programmer discussion forum can be linked together. That deep
learning system took 14 hours to execute. We show here that applying a very
simple optimizer called DE to fine tune SVM, it can achieve similar (and
sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84
times faster hours than deep learning method.
We offer these results as a cautionary tale to the software analytics
community and suggest that not every new innovation should be applied without
critical analysis. If researchers deploy some new and expensive process, that
work should be baselined against some simpler and faster alternatives.Comment: 12 pages, 6 figures, accepted at FSE201
A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs with a Costly max-Oracle
Structural support vector machines (SSVMs) are amongst the best performing
models for structured computer vision tasks, such as semantic image
segmentation or human pose estimation. Training SSVMs, however, is
computationally costly, because it requires repeated calls to a structured
prediction subroutine (called \emph{max-oracle}), which has to solve an
optimization problem itself, e.g. a graph cut.
In this work, we introduce a new algorithm for SSVM training that is more
efficient than earlier techniques when the max-oracle is computationally
expensive, as it is frequently the case in computer vision tasks. The main idea
is to (i) combine the recent stochastic Block-Coordinate Frank-Wolfe algorithm
with efficient hyperplane caching, and (ii) use an automatic selection rule for
deciding whether to call the exact max-oracle or to rely on an approximate one
based on the cached hyperplanes.
We show experimentally that this strategy leads to faster convergence to the
optimum with respect to the number of requires oracle calls, and that this
translates into faster convergence with respect to the total runtime when the
max-oracle is slow compared to the other steps of the algorithm.
A publicly available C++ implementation is provided at
http://pub.ist.ac.at/~vnk/papers/SVM.html
Is One Hyperparameter Optimizer Enough?
Hyperparameter tuning is the black art of automatically finding a good
combination of control parameters for a data miner. While widely applied in
empirical Software Engineering, there has not been much discussion on which
hyperparameter tuner is best for software analytics. To address this gap in the
literature, this paper applied a range of hyperparameter optimizers (grid
search, random search, differential evolution, and Bayesian optimization) to
defect prediction problem. Surprisingly, no hyperparameter optimizer was
observed to be `best' and, for one of the two evaluation measures studied here
(F-measure), hyperparameter optimization, in 50\% cases, was no better than
using default configurations.
We conclude that hyperparameter optimization is more nuanced than previously
believed. While such optimization can certainly lead to large improvements in
the performance of classifiers used in software analytics, it remains to be
seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1
- …