38 research outputs found
Computer Games Are Serious Business and so is their Quality: Particularities of Software Testing in Game Development from the Perspective of Practitioners
Over the last several decades, computer games started to have a significant impact on society. However, although a computer game is a type of software, the process to conceptualize, produce and deliver a game could involve unusual features. In software testing, for instance, studies demonstrated the hesitance of professionals to use automated testing techniques with games, due to the constant changes in requirements and design, and pointed out the need for creating testing tools that take into account the flexibility required for the game development process. Goal. This study aims to improve the current body of knowledge regarding software testing in game development and point out the existing particularities observed in software testing considering the development of a computer game. Method. A mixed-method approach based on a case study and an opinion survey was applied to collect quantitative and qualitative data from software professionals regarding the particularities of software testing in game development. Results. We analyzed over 70 messages posted on three well-established network of question-and-answer communities related to software engineering, software testing and game development and received answers of 38 professionals discussing differences between testing a computer game and a general software, and identified important aspects to be observed by practitioners in the process of planning, performing and reporting tests in this context. Conclusion. Considering computer games, software testing must focus not only on the common aspects of a general software, but also, track and investigate issues that could be related to game balance, game physics and entertainment related-aspects to guarantee the quality of computer games and a successful testing process
Vulnerable Open Source Dependencies: Counting Those That Matter
BACKGROUND: Vulnerable dependencies are a known problem in today's
open-source software ecosystems because OSS libraries are highly interconnected
and developers do not always update their dependencies. AIMS: In this paper we
aim to present a precise methodology, that combines the code-based analysis of
patches with information on build, test, update dates, and group extracted from
the very code repository, and therefore, caters to the needs of industrial
practice for correct allocation of development and audit resources. METHOD: To
understand the industrial impact of the proposed methodology, we considered the
200 most popular OSS Java libraries used by SAP in its own software. Our
analysis included 10905 distinct GAVs (group, artifact, version) when
considering all the library versions. RESULTS: We found that about 20% of the
dependencies affected by a known vulnerability are not deployed, and therefore,
they do not represent a danger to the analyzed library because they cannot be
exploited in practice. Developers of the analyzed libraries are able to fix
(and actually responsible for) 82% of the deployed vulnerable dependencies. The
vast majority (81%) of vulnerable dependencies may be fixed by simply updating
to a new version, while 1% of the vulnerable dependencies in our sample are
halted, and therefore, potentially require a costly mitigation strategy.
CONCLUSIONS: Our case study shows that the correct counting allows software
development companies to receive actionable information about their library
dependencies, and therefore, correctly allocate costly development and audit
resources, which is spent inefficiently in case of distorted measurements.Comment: This is a pre-print of the paper that appears, with the same title,
in the proceedings of the 12th International Symposium on Empirical Software
Engineering and Measurement, 201
An Empirical Analysis of Vulnerabilities in Python Packages for Web Applications
This paper examines software vulnerabilities in common Python packages used
particularly for web development. The empirical dataset is based on the PyPI
package repository and the so-called Safety DB used to track vulnerabilities in
selected packages within the repository. The methodological approach builds on
a release-based time series analysis of the conditional probabilities for the
releases of the packages to be vulnerable. According to the results, many of
the Python vulnerabilities observed seem to be only modestly severe; input
validation and cross-site scripting have been the most typical vulnerabilities.
In terms of the time series analysis based on the release histories, only the
recent past is observed to be relevant for statistical predictions; the
classical Markov property holds.Comment: Forthcoming in: Proceedings of the 9th International Workshop on
Empirical Software Engineering in Practice (IWESEP 2018), Nara, IEE
20-MAD -- 20 Years of Issues and Commits of Mozilla and Apache Development
Data of long-lived and high profile projects is valuable for research on
successful software engineering in the wild. Having a dataset with different
linked software repositories of such projects, enables deeper diving
investigations. This paper presents 20-MAD, a dataset linking the commit and
issue data of Mozilla and Apache projects. It includes over 20 years of
information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue
comments, and its compressed size is over 6 GB. The data contains all the
typical information about source code commits (e.g., lines added and removed,
message and commit time) and issues (status, severity, votes, and summary). The
issue comments have been pre-processed for natural language processing and
sentiment analysis. This includes emoticons and valence and arousal scores.
Linking code repository and issue tracker information, allows studying
individuals in two types of repositories and provide more accurate time zone
information for issue trackers as well. To our knowledge, this the largest
linked dataset in size and in project lifetime that is not based on GitHub.Comment: 17th International Conference on Mining Software Repositories, 202
Quality measurement in agile and rapid software development: A systematic mapping
Context: In despite of agile and rapid software development (ARSD) being researched and applied extensively, managing quality requirements (QRs) are still challenging. As ARSD processes produce a large amount of data, measurement has become a strategy to facilitate QR management. Objective: This study aims to survey the literature related to QR management through metrics in ARSD, focusing on: bibliometrics, QR metrics, and quality-related indicators used in quality management. Method: The study design includes the definition of research questions, selection criteria, and snowballing as search strategy. Results: We selected 61 primary studies (2001-2019). Despite a large body of knowledge and standards, there is no consensus regarding QR measurement. Terminology is varying as are the measuring models. However, seemingly different measurement models do contain similarities. Conclusion: The industrial relevance of the primary studies shows that practitioners have a need to improve quality measurement. Our collection of measures and data sources can serve as a starting point for practitioners to include quality measurement into their decision-making processes. Researchers could benefit from the identified similarities to start building a common framework for quality measurement. In addition, this could help researchers identify what quality aspects need more focus, e.g., security and usability with few metrics reported.This work has been funded by the European Union’s Horizon 2020 research and innovation program through the Q-Rapids project (grant no. 732253). This research was also partially supported by the Spanish Ministerio de Economía, Industria y Competitividad through the DOGO4ML project (grant PID2020-117191RB-I00). Silverio Martínez-Fernández worked in Fraunhofer IESE before January 2020.Peer ReviewedPostprint (published version
Snoring: A Noise Defect Prediction Datasets
Defect prediction aims at identifying software artifacts that are likely to exhibit a defect. The main purpose of defect prediction is to reduce the cost of testing and code review, by letting developers focus on specific artifacts. Several researchers have worked on improving the accuracy of defect estimation models using techniques such as tuning, re-balancing, or feature selection. Ultimately, the reliability of a prediction model depends on the quality of the dataset. Therefore effort has been spent in identifying sources of noise in the datasets, and how to deal with them, including defect misclassification and defect origin. A key component of defect prediction approaches is the attribution of a defect to a projects release. Although developers might be able to attribute a defect to a specific release, in most cases a defect is attributed to the release after which the defect has been discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon a “sleeping defect”. We call “snoring” the phenomenon in which classes are affected by sleeping defects only, that would be treated as defect-free until the defect is discovered. In this work, we analyze, on data from more than 4,000 bugs and 600 releases of 20 open source projects from the Apache ecosystem for investigating: 1)the magnitude of the sleeping defects, 2) the magnitude of the snoring classes, 3)if snoring impacts the evaluation of classifiers, 4)if snoring impacts classifier accuracy, and 5)if removing the last releases of data is beneficial in reducing the negative impact of the snoring noise on classifiers accuracy. Our results show that, on average across projects: 1)most of the defects in a project slept for more than 19% of the existing releases, 2)the missing rate is more than 50% unless we remove more than 20% of the releases, 3) the relative error in measuring the classifier accuracy achieved by using a dataset with snoring is about 100% in all accuracy metrics other than AUC, 4) the presence of snoring decreases the accuracy in each of the 15 classifiers, in each of the 6 accuracy metrics. For instance, Recall, F1, Kappa and Matthews decreases by about 80%, and 5) removing one release of data is better than removing no data in all accuracy metrics. For instance, Recall, F1, Kappa and Matthews increase by about 30%