Search CORE

6,945 research outputs found

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P.
Hapfelmeier Alexander
Robinson Mark D.
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M.
Publication venue
Publication date: 01/01/2019
Field of study

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update

arXiv.org e-Print Archive

You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

Author: Bissyandé Tegawendé F.
Kim Dongsun
Klein Jacques
Koyuncu Anil
Liu Kui
Traon Yves Le
Publication venue
Publication date: 15/02/2019
Field of study

Properly benchmarking Automated Program Repair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by `tweaking' the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-the-art APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems.Comment: Accepted by ICST 201

arXiv.org e-Print Archive

Recommended from our members

Evaluating the resilience and security of boundaryless, evolving socio-technical Systems of Systems

Author: Bloomfield R. E.
DSTL
Gashi I.
Publication venue: Centre for Software Reliability, City University London
Publication date: 01/01/2008
Field of study

City Research Online

Towards Automated Performance Bug Identification in Python

Author: Mazzawi Elie
Miranskyy Andriy
Tsakiltsidis Sokratis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/07/2016
Field of study

Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement/marketing domain. Goal: Find a simple and easy to implement solution, predicting performance bugs. Method: We built several models using four machine learning methods, commonly used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian Networks, and Logistic Regression. Results: Our empirical results show that a C4.5 model, using lines of code changed, file's age and size as explanatory variables, can be used to predict performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that reducing the number of changes delivered on a commit, can decrease the chance of performance bug injection. Conclusions: We believe that our approach can help practitioners to eliminate performance bugs early in the development cycle. Our results are also of interest to theoreticians, establishing a link between functional bugs and (non-functional) performance bugs, and explicitly showing that attributes used for prediction of functional bugs can be used for prediction of performance bugs

arXiv.org e-Print Archive

Software como um Serviço: uma plataforma eficaz para oferta de sistemas holísticos de gestão da performance

Author: António Nuno
Serra Francisco
Publication venue: 'School of Management, Hospitality and Tourism, University of the Algarve'
Publication date: 01/01/2018
Field of study

This study main objective was to assess the viability of development of a Performance Management (PM) system, delivered in the form of Software as a Service (SaaS), specific for the hospitality industry and to evaluate the benefits of its use. Software deployed in the cloud, delivered and licensed as a service, is becoming increasingly common and accepted in a business context. Although, Business Intelligence (BI) solutions are not usually distributed in the SaaS model, there are some examples that this is changing. To achieve the study objective, design science research methodology was employed in the development of a prototype. This prototype was deployed in four hotels and its results evaluated. Evaluation of the prototype was focused both on the system technical characteristics and business benefits. Results shown that hotels were very satisfied with the system and that building a prototype and making it available in the form of SaaS is a good solution to assess BI systems contribution to improve management performance.O objetivo principal deste estudo é avaliar a viabilidade de desenvolvimento de um sistema de Gestão da Performance, entregue sob a forma de “Software como Serviço” (SaaS), específico para o setor hoteleiro, e também avaliar os benefícios de seu uso. O software implantado na cloud, entregue e licenciado como um serviço, é cada vez mais aceite num contexto de negócios. Todavia, não é comum que soluções de Business Intelligence (BI) sejam distribuídas neste modelo SaaS. No entanto, existem alguns exemplos de que isso se está a alterar. Para atingir o objetivo do estudo, foi utilizada Design Science Research como metodologia de pesquisa científica para desenvolvimento de um protótipo. Este protótipo foi implementado em quatro hotéis para que os seus resultados pudessem ser avaliados. A avaliação foi focada tanto nas características técnicas do sistema como nos benefícios para o negócio. Os resultados mostraram que os hotéis estavam muito satisfeitos com o sistema e que construir um protótipo e disponibilizá-lo sob a forma de SaaS é uma boa solução para avaliar a contribuição dos sistemas de BI para melhorar o desempenho da gestão.info:eu-repo/semantics/publishedVersio

An extensible benchmark and tooling for comparing reverse engineering approaches

Author: Cutting David
Noppen Joost
Publication venue
Publication date: 30/06/2015
Field of study

Various tools exist to reverse engineer software source code and generate design information, such as UML projections. Each has specific strengths and weaknesses, however no standardised benchmark exists that can be used to evaluate and compare their performance and effectiveness in a systematic manner. To facilitate such comparison in this paper we introduce the Reverse Engineering to Design Benchmark (RED-BM), which consists of a comprehensive set of Java-based targets for reverse engineering and a formal set of performance measures with which tools and approaches can be analysed and ranked. When used to evaluate 12 industry standard tools performance figures range from 8.82\% to 100\% demonstrating the ability of the benchmark to differentiate between tools. To aid the comparison, analysis and further use of reverse engineering XMI output we have developed a parser which can interpret the XMI output format of the most commonly used reverse engineering applications, and is used in a number of tools

Towards critical event monitoring, detection and prediction for self-adaptive future Internet applications

Author: Boniface M.J.
Engen Vegard
Metzger A.
Phillips Stephen
Zlatev Zlatko
Publication venue
Publication date: 28/10/2011
Field of study

The Future Internet (FI) will be composed of a multitude of diverse types of services that offer flexible, remote access to software features, content, computing resources, and middleware solutions through different cloud delivery models, such as IaaS, PaaS and SaaS. Ultimately, this means that loosely coupled Internet services will form a comprehensive base for developing value added applications in an agile way. Unlike traditional application development, which uses computing resources and software components under local administrative control, FI applications will thus strongly depend on third-party services. To maintain their quality of service, those applications therefore need to dynamically and autonomously adapt to an unprecedented level of changes that may occur during runtime. In this paper, we present our recent experiences on monitoring, detection, and prediction of critical events for both software services and multimedia applications. Based on these findings we introduce potential directions for future research on self-adaptive FI applications, bringing together those research directions

Southampton (e-Prints Soton)