6,945 research outputs found
Essential guidelines for computational method benchmarking
In computational biology and other sciences, researchers are frequently faced
with a choice between several computational methods for performing data
analyses. Benchmarking studies aim to rigorously compare the performance of
different methods using well-characterized benchmark datasets, to determine the
strengths of each method or to provide recommendations regarding suitable
choices of methods for an analysis. However, benchmarking studies must be
carefully designed and implemented to provide accurate, unbiased, and
informative results. Here, we summarize key practical guidelines and
recommendations for performing high-quality benchmarking analyses, based on our
experiences in computational biology.Comment: Minor update
You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems
Properly benchmarking Automated Program Repair (APR) systems should
contribute to the development and adoption of the research outputs by
practitioners. To that end, the research community must ensure that it reaches
significant milestones by reliably comparing state-of-the-art tools for a
better understanding of their strengths and weaknesses. In this work, we
identify and investigate a practical bias caused by the fault localization (FL)
step in a repair pipeline. We propose to highlight the different fault
localization configurations used in the literature, and their impact on APR
systems when applied to the Defects4J benchmark. Then, we explore the
performance variations that can be achieved by `tweaking' the FL step.
Eventually, we expect to create a new momentum for (1) full disclosure of APR
experimental procedures with respect to FL, (2) realistic expectations of
repairing bugs in Defects4J, as well as (3) reliable performance comparison
among the state-of-the-art APR systems, and against the baseline performance
results of our thoroughly assessed kPAR repair tool. Our main findings include:
(a) only a subset of Defects4J bugs can be currently localized by commonly-used
FL techniques; (b) current practice of comparing state-of-the-art APR systems
(i.e., counting the number of fixed bugs) is potentially misleading due to the
bias of FL configurations; and (c) APR authors do not properly qualify their
performance achievement with respect to the different tuning parameters
implemented in APR systems.Comment: Accepted by ICST 201
Recommended from our members
Evaluating the resilience and security of boundaryless, evolving socio-technical Systems of Systems
Towards Automated Performance Bug Identification in Python
Context: Software performance is a critical non-functional requirement,
appearing in many fields such as mission critical applications, financial, and
real time systems. In this work we focused on early detection of performance
bugs; our software under study was a real time system used in the
advertisement/marketing domain.
Goal: Find a simple and easy to implement solution, predicting performance
bugs.
Method: We built several models using four machine learning methods, commonly
used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian
Networks, and Logistic Regression.
Results: Our empirical results show that a C4.5 model, using lines of code
changed, file's age and size as explanatory variables, can be used to predict
performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that
reducing the number of changes delivered on a commit, can decrease the chance
of performance bug injection.
Conclusions: We believe that our approach can help practitioners to eliminate
performance bugs early in the development cycle. Our results are also of
interest to theoreticians, establishing a link between functional bugs and
(non-functional) performance bugs, and explicitly showing that attributes used
for prediction of functional bugs can be used for prediction of performance
bugs
Software como um Serviço: uma plataforma eficaz para oferta de sistemas holĂsticos de gestĂŁo da performance
This study main objective was to assess the viability of development of a Performance Management (PM) system, delivered in the form of Software as a Service (SaaS), specific for the hospitality industry and to evaluate the benefits of its use. Software deployed in the cloud, delivered and licensed as a service, is becoming increasingly common and accepted in a business context. Although, Business Intelligence (BI) solutions are not usually distributed in the SaaS model, there are some examples that this is changing. To achieve the study objective, design science research methodology was employed in the development of a prototype. This prototype was deployed in four hotels and its results evaluated. Evaluation of the prototype was focused both on the system technical characteristics and business benefits. Results shown that hotels were very satisfied with the system and that building a prototype and making it available in the form of SaaS is a good solution to assess BI systems contribution to improve management performance.O objetivo principal deste estudo Ă© avaliar a viabilidade de
desenvolvimento de um sistema de GestĂŁo da Performance, entregue
sob a forma de “Software como Serviço” (SaaS), especĂfico para o setor
hoteleiro, e tambĂ©m avaliar os benefĂcios de seu uso. O software
implantado na cloud, entregue e licenciado como um serviço, é cada vez
mais aceite num contexto de negĂłcios. Todavia, nĂŁo Ă© comum que
soluções de Business Intelligence (BI) sejam distribuĂdas neste modelo
SaaS. No entanto, existem alguns exemplos de que isso se está a alterar.
Para atingir o objetivo do estudo, foi utilizada Design Science Research
como metodologia de pesquisa cientĂfica para desenvolvimento de um
protótipo. Este protótipo foi implementado em quatro hotéis para que
os seus resultados pudessem ser avaliados. A avaliação foi focada tanto
nas caracterĂsticas tĂ©cnicas do sistema como nos benefĂcios para o
negócio. Os resultados mostraram que os hotéis estavam muito
satisfeitos com o sistema e que construir um protótipo e disponibilizá-lo sob a forma de SaaS é uma boa solução para avaliar a contribuição
dos sistemas de BI para melhorar o desempenho da gestĂŁo.info:eu-repo/semantics/publishedVersio
An extensible benchmark and tooling for comparing reverse engineering approaches
Various tools exist to reverse engineer software source code and generate design information, such as UML projections. Each has specific strengths and weaknesses, however no standardised benchmark exists that can be used to evaluate and compare their performance and effectiveness in a systematic manner. To facilitate such comparison in this paper we introduce the Reverse Engineering to Design Benchmark (RED-BM), which consists of a comprehensive set of Java-based targets for reverse engineering and a formal set of performance measures with which tools and approaches can be analysed and ranked. When used to evaluate 12 industry standard tools performance figures range from 8.82\% to 100\% demonstrating the ability of the benchmark to differentiate between tools. To aid the comparison, analysis and further use of reverse engineering XMI output we have developed a parser which can interpret the XMI output format of the most commonly used reverse engineering applications, and is used in a number of tools
Towards critical event monitoring, detection and prediction for self-adaptive future Internet applications
The Future Internet (FI) will be composed of a multitude of diverse types of services that offer flexible, remote access to software features, content, computing resources, and middleware solutions through different cloud delivery models, such as IaaS, PaaS and SaaS. Ultimately, this means that loosely coupled Internet services will form a comprehensive base for developing value added applications in an agile way. Unlike traditional application development, which uses computing resources and software components under local administrative control, FI applications will thus strongly depend on third-party services. To maintain their quality of service, those applications therefore need to dynamically and autonomously adapt to an unprecedented level of changes that may occur during runtime. In this paper, we present our recent experiences on monitoring, detection, and prediction of critical events for both software services and multimedia applications. Based on these findings we introduce potential directions for future research on self-adaptive FI applications, bringing together those research directions
- …