Search CORE

13 research outputs found

Which Surrogate Works for Empirical Performance Modelling? A Case Study with Differential Evolution

Author: Li Ke
Tan Kay Chen
Xiang Zilin
Publication venue
Publication date: 30/01/2019
Field of study

It is not uncommon that meta-heuristic algorithms contain some intrinsic parameters, the optimal configuration of which is crucial for achieving their peak performance. However, evaluating the effectiveness of a configuration is expensive, as it involves many costly runs of the target algorithm. Perhaps surprisingly, it is possible to build a cheap-to-evaluate surrogate that models the algorithm's empirical performance as a function of its parameters. Such surrogates constitute an important building block for understanding algorithm performance, algorithm portfolio/selection, and the automatic algorithm configuration. In principle, many off-the-shelf machine learning techniques can be used to build surrogates. In this paper, we take the differential evolution (DE) as the baseline algorithm for proof-of-concept study. Regression models are trained to model the DE's empirical performance given a parameter configuration. In particular, we evaluate and compare four popular regression algorithms both in terms of how well they predict the empirical performance with respect to a particular parameter configuration, and also how well they approximate the parameter versus the empirical performance landscapes

arXiv.org e-Print Archive

Crossref

DeepSQLi: Deep Semantic Learning for Testing SQL Injection

Author: Anna Huang Cheng-Zhi
Appelt Dennis
Ariu Davide
Choudhary William GJ
Diederik
Dong Linhao
Doshi Rohan
Guthrie David
Kiezun Adam
Mnih Volodymyr
Raychev Veselin
Sinha Sanjib
Tian Wei
Vaswani Ashish
Vinyals Oriol
William G.
Publication venue
Publication date: 24/05/2020
Field of study

Security is unarguably the most serious concern for Web applications, to which SQL injection (SQLi) attack is one of the most devastating attacks. Automatically testing SQLi vulnerabilities is of ultimate importance, yet is unfortunately far from trivial to implement. This is because the existence of a huge, or potentially infinite, number of variants and semantic possibilities of SQL leading to SQLi attacks on various Web applications. In this paper, we propose a deep natural language processing based tool, dubbed DeepSQLi, to generate test cases for detecting SQLi vulnerabilities. Through adopting deep learning based neural language model and sequence of words prediction, DeepSQLi is equipped with the ability to learn the semantic knowledge embedded in SQLi attacks, allowing it to translate user inputs (or a test case) into a new test case, which is semantically related and potentially more sophisticated. Experiments are conducted to compare DeepSQLi with SQLmap, a state-of-the-art SQLi testing automation tool, on six real-world Web applications that are of different scales, characteristics and domains. Empirical results demonstrate the effectiveness and the remarkable superiority of DeepSQLi over SQLmap, such that more SQLi vulnerabilities can be identified by using a less number of test cases, whilst running much faster

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Which Surrogate Works for Empirical Performance Modelling? A Case Study with Differential Evolution

Author: Li K
Tan KC
Xiang Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/03/2019
Field of study

This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record.It is not uncommon that meta-heuristic algorithms contain some intrinsic parameters, the optimal configuration of which is crucial for achieving their peak performance. However, evaluating the effectiveness of a configuration is expensive, as it involves many costly runs of the target algorithm. Perhaps surprisingly, it is possible to build a cheap-to-evaluate surrogate that models the algorithm's empirical performance as a function of its parameters. Such surrogates constitute an important building block for understanding algorithm performance, algorithm portfolio/selection, and the automatic algorithm configuration. In principle, many off-the-shelf machine learning techniques can be used to build surrogates. In this paper, we take the differential evolution (DE) as the baseline algorithm for proof-of-concept study. Regression models are trained to model the DE's empirical performance given a parameter configuration. In particular, we evaluate and compare four popular regression algorithms both in terms of how well they predict the empirical performance with respect to a particular parameter configuration, and also how well they approximate the parameter versus the empirical performance landscapes.Royal Societ

Open Research Exeter

BiLO-CPDP: Bi-Level Programming for Automated Model Discovery in Cross-Project Defect Prediction

Author: Chen Tao
Li Ke
Tan Kay Chen
Xiang Zilin
Publication venue
Publication date: 31/08/2020
Field of study

Cross-Project Defect Prediction (CPDP), which borrows data from similar projects by combining a transfer learner with a classifier, have emerged as a promising way to predict software defects when the available data about the target project is insufficient. How-ever, developing such a model is challenge because it is difficult to determine the right combination of transfer learner and classifier along with their optimal hyper-parameter settings. In this paper, we propose a tool, dubbedBiLO-CPDP, which is the first of its kind to formulate the automated CPDP model discovery from the perspective of bi-level programming. In particular, the bi-level programming proceeds the optimization with two nested levels in a hierarchical manner. Specifically, the upper-level optimization routine is designed to search for the right combination of transfer learner and classifier while the nested lower-level optimization routine aims to optimize the corresponding hyper-parameter settings.To evaluateBiLO-CPDP, we conduct experiments on 20 projects to compare it with a total of 21 existing CPDP techniques, along with its single-level optimization variant and Auto-Sklearn, a state-of-the-art automated machine learning tool. Empirical results show that BiLO-CPDP champions better prediction performance than all other 21 existing CPDP techniques on 70% of the projects, while be-ing overwhelmingly superior to Auto-Sklearn and its single-level optimization variant on all cases. Furthermore, the unique bi-level formalization inBiLO-CPDP also permits to allocate more budget to the upper-level, which significantly boosts the performance

arXiv.org e-Print Archive

University of Birmingham Research Portal

Interactive Decomposition Multi-Objective Optimization via Progressively Learned Value Functions

Author: Chen Renzhi
Li Ke
Savic Dragan
Yao Xin
Publication venue
Publication date: 30/09/2018
Field of study

Decomposition has become an increasingly popular technique for evolutionary multi-objective optimization (EMO). A decomposition-based EMO algorithm is usually designed to approximate a whole Pareto-optimal front (PF). However, in practice, the decision maker (DM) might only be interested in her/his region of interest (ROI), i.e., a part of the PF. Solutions outside that might be useless or even noisy to the decision-making procedure. Furthermore, there is no guarantee to find the preferred solutions when tackling many-objective problems. This paper develops an interactive framework for the decomposition-based EMO algorithm to lead a DM to the preferred solutions of her/his choice. It consists of three modules, i.e., consultation, preference elicitation and optimization. Specifically, after every several generations, the DM is asked to score a few candidate solutions in a consultation session. Thereafter, an approximated value function, which models the DM's preference information, is progressively learned from the DM's behavior. In the preference elicitation session, the preference information learned in the consultation module is translated into the form that can be used in a decomposition-based EMO algorithm, i.e., a set of reference points that are biased toward to the ROI. The optimization module, which can be any decomposition-based EMO algorithm in principle, utilizes the biased reference points to direct its search process. Extensive experiments on benchmark problems with three to ten objectives fully demonstrate the effectiveness of our proposed method for finding the DM's preferred solutions.Comment: 25 pages, 18 figures, 3 table

arXiv.org e-Print Archive

Open Research Exeter