15,096 research outputs found
Bug propagation and debugging in asymmetric software structures
Software dependence networks are shown to be scale-free and asymmetric. We
then study how software components are affected by the failure of one of them,
and the inverse problem of locating the faulty component. Software at all
levels is fragile with respect to the failure of a random single component.
Locating a faulty component is easy if the failures only affect their nearest
neighbors, while it is hard if the failures propagate further.Comment: 4 pages, 4 figure
An Empirical Analysis of Vulnerabilities in Python Packages for Web Applications
This paper examines software vulnerabilities in common Python packages used
particularly for web development. The empirical dataset is based on the PyPI
package repository and the so-called Safety DB used to track vulnerabilities in
selected packages within the repository. The methodological approach builds on
a release-based time series analysis of the conditional probabilities for the
releases of the packages to be vulnerable. According to the results, many of
the Python vulnerabilities observed seem to be only modestly severe; input
validation and cross-site scripting have been the most typical vulnerabilities.
In terms of the time series analysis based on the release histories, only the
recent past is observed to be relevant for statistical predictions; the
classical Markov property holds.Comment: Forthcoming in: Proceedings of the 9th International Workshop on
Empirical Software Engineering in Practice (IWESEP 2018), Nara, IEE
Rehearsal: A Configuration Verification Tool for Puppet
Large-scale data centers and cloud computing have turned system configuration
into a challenging problem. Several widely-publicized outages have been blamed
not on software bugs, but on configuration bugs. To cope, thousands of
organizations use system configuration languages to manage their computing
infrastructure. Of these, Puppet is the most widely used with thousands of
paying customers and many more open-source users. The heart of Puppet is a
domain-specific language that describes the state of a system. Puppet already
performs some basic static checks, but they only prevent a narrow range of
errors. Furthermore, testing is ineffective because many errors are only
triggered under specific machine states that are difficult to predict and
reproduce. With several examples, we show that a key problem with Puppet is
that configurations can be non-deterministic.
This paper presents Rehearsal, a verification tool for Puppet configurations.
Rehearsal implements a sound, complete, and scalable determinacy analysis for
Puppet. To develop it, we (1) present a formal semantics for Puppet, (2) use
several analyses to shrink our models to a tractable size, and (3) frame
determinism-checking as decidable formulas for an SMT solver. Rehearsal then
leverages the determinacy analysis to check other important properties, such as
idempotency. Finally, we apply Rehearsal to several real-world Puppet
configurations.Comment: In proceedings of ACM SIGPLAN Conference on Programming Language
Design and Implementation (PLDI) 201
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Most modern Issue Tracking Systems (ITSs) for open source software (OSS)
projects allow users to add comments to issues. Over time, these comments
accumulate into discussion threads embedded with rich information about the
software project, which can potentially satisfy the diverse needs of OSS
stakeholders. However, discovering and retrieving relevant information from the
discussion threads is a challenging task, especially when the discussions are
lengthy and the number of issues in ITSs are vast. In this paper, we address
this challenge by identifying the information types presented in OSS issue
discussions. Through qualitative content analysis of 15 complex issue threads
across three projects hosted on GitHub, we uncovered 16 information types and
created a labeled corpus containing 4656 sentences. Our investigation of
supervised, automated classification techniques indicated that, when prior
knowledge about the issue is available, Random Forest can effectively detect
most sentence types using conversational features such as the sentence length
and its position. When classifying sentences from new issues, Logistic
Regression can yield satisfactory performance using textual features for
certain information types, while falling short on others. Our work represents a
nontrivial first step towards tools and techniques for identifying and
obtaining the rich information recorded in the ITSs to support various software
engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering
(ICSE2019
A Longitudinal Study of Identifying and Paying Down Architectural Debt
Architectural debt is a form of technical debt that derives from the gap
between the architectural design of the system as it "should be" compared to
"as it is". We measured architecture debt in two ways: 1) in terms of
system-wide coupling measures, and 2) in terms of the number and severity of
architectural flaws. In recent work it was shown that the amount of
architectural debt has a huge impact on software maintainability and evolution.
Consequently, detecting and reducing the debt is expected to make software more
amenable to change. This paper reports on a longitudinal study of a healthcare
communications product created by Brightsquid Secure Communications Corp. This
start-up company is facing the typical trade-off problem of desiring
responsiveness to change requests, but wanting to avoid the ever-increasing
effort that the accumulation of quick-and-dirty changes eventually incurs. In
the first stage of the study, we analyzed the status of the "before" system,
which indicated the impacts of change requests. This initial study motivated a
more in-depth analysis of architectural debt. The results of this analysis were
used to motivate a comprehensive refactoring of the software system. The third
phase of the study was a follow-on architectural debt analysis which quantified
the improvements made. Using this quantitative evidence, augmented by
qualitative evidence gathered from in-depth interviews with Brightsquid's
architects, we present lessons learned about the costs and benefits of paying
down architecture debt in practice.Comment: Submitted to ICSE-SEIP 201
Y2K Interruption: Can the Doomsday Scenario Be Averted?
The management philosophy until recent years has been to replace the workers with computers, which are available 24 hours a day, need no benefits, no insurance and never complain. But as the year 2000 approached, along with it came the fear of the millennium bug, generally known as Y2K, and the computers threatened to strike!!!! Y2K, though an abbreviation of year 2000, generally refers to the computer glitches which are associated with the year 2000. Computer companies, in order to save memory and money, adopted a voluntary standard in the beginning of the computer era that all computers automatically convert any year designated by two numbers such as 99 into 1999 by adding the digits 19. This saved enormous amount of memory, and thus money, because large databases containing birth dates or other dates only needed to contain the last two digits such as 65 or 86. But it also created a built in flaw that could make the computers inoperable from January 2000. The problem is that most of these old computers are programmed to convert 00 (for the year 2000) into 1900 and not 2000. The trouble could therefore, arise when the systems had to deal with dates outside the 1900s. In 2000, for example a programme that calculates the age of a person born in 1965 will subtract 65 from 00 and get -65. The problem is most acute in mainframe systems, but that does not mean PCs, UNIX and other computing environments are trouble free. Any computer system that relies on date calculations must be tested because the Y2K or the millennium bug arises because of a potential for “date discontinuity” which occurs when the time expressed by a system, or any of its components, does not move in consonance with real time. Though attention has been focused on the potential problems linked with change from 1999 to 2000, date discontinuity may occur at other times in and around this period.
- …