208,554 research outputs found
Selection of third party software in Off-The-Shelf-based software development: an interview study with industrial practitioners
The success of software development using third party components highly depends on the ability to select a suitable component for the intended application. The evidence shows that there is limited knowledge about current industrial OTS selection practices. As a result, there is often a gap between theory and practice, and the proposed methods for supporting selection are rarely adopted in the industrial practice. This paper's goal is to investigate the actual industrial practice of component selection in order to provide an initial empirical basis that allows the reconciliation of research and industrial endeavors. The study consisted of semi-structured interviews with 23 employees from 20 different software-intensive companies that mostly develop web information system applications. It provides qualitative information that help to further understand these practices, and emphasize some aspects that have been overlooked by researchers. For instance, although the literature claims that component repositories are important for locating reusable components; these are hardly used in industrial practice. Instead, other resources that have not received considerable attention are used with this aim. Practices and potential market niches for software-intensive companies have been also identified. The results are valuable from both the research and the industrial perspectives as they provide a basis for formulating well-substantiated hypotheses and more effective improvement strategies.Peer ReviewedPostprint (author's final draft
Test Set Diameter: Quantifying the Diversity of Sets of Test Cases
A common and natural intuition among software testers is that test cases need
to differ if a software system is to be tested properly and its quality
ensured. Consequently, much research has gone into formulating distance
measures for how test cases, their inputs and/or their outputs differ. However,
common to these proposals is that they are data type specific and/or calculate
the diversity only between pairs of test inputs, traces or outputs.
We propose a new metric to measure the diversity of sets of tests: the test
set diameter (TSDm). It extends our earlier, pairwise test diversity metrics
based on recent advances in information theory regarding the calculation of the
normalized compression distance (NCD) for multisets. An advantage is that TSDm
can be applied regardless of data type and on any test-related information, not
only the test inputs. A downside is the increased computational time compared
to competing approaches.
Our experiments on four different systems show that the test set diameter can
help select test sets with higher structural and fault coverage than random
selection even when only applied to test inputs. This can enable early test
design and selection, prior to even having a software system to test, and
complement other types of test automation and analysis. We argue that this
quantification of test set diversity creates a number of opportunities to
better understand software quality and provides practical ways to increase it.Comment: In submissio
On the Effectiveness of Log Representation for Log-based Anomaly Detection
Logs are an essential source of information for people to understand the
running status of a software system. Due to the evolving modern software
architecture and maintenance methods, more research efforts have been devoted
to automated log analysis. In particular, machine learning (ML) has been widely
used in log analysis tasks. In ML-based log analysis tasks, converting textual
log data into numerical feature vectors is a critical and indispensable step.
However, the impact of using different log representation techniques on the
performance of the downstream models is not clear, which limits researchers and
practitioners' opportunities of choosing the optimal log representation
techniques in their automated log analysis workflows. Therefore, this work
investigates and compares the commonly adopted log representation techniques
from previous log analysis research. Particularly, we select six log
representation techniques and evaluate them with seven ML models and four
public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of
log-based anomaly detection. We also examine the impacts of the log parsing
process and the different feature aggregation approaches when they are employed
with log representation techniques. From the experiments, we provide some
heuristic guidelines for future researchers and developers to follow when
designing an automated log analysis workflow. We believe our comprehensive
comparison of log representation techniques can help researchers and
practitioners better understand the characteristics of different log
representation techniques and provide them with guidance for selecting the most
suitable ones for their ML-based log analysis workflow.Comment: Accepted by Journal of Empirical Software Engineering (EMSE
Recommended from our members
An Empirical Study of the Effectiveness of 'Forcing Diversity' Based on a Large Population of Diverse Programs
Use of diverse software components is a viable defence against common-mode failures in redundant softwarebased systems. Various forms of "Diversity-Seeking Decisions" (“DSDs”) can be applied to the process of developing, or procuring, redundant components, to improve the chances of the resulting components not failing on the same demands. An open question is how effective these decisions, and their combinations, are for achieving large enough reliability gains. Using a large population of software programs, we studied experimentally the effectiveness of specific "DSDs" (and their combinations) mandating differences between redundant components. Some of these combinations produced much better improvements in system probability of failure per demand (PFD) than "uncontrolled" diversity did. Yet, our findings suggest that the gains from such "DSDs" vary significantly between them and between the application problems studied. The relationship between DSDs and system PFD is complex and does not allow for simple universal rules
(e.g. "the more diversity the better") to apply
- …