2,120 research outputs found
Filling the gaps of development logs and bug issue data
It has been suggested that the data from bug repositories is not always in sync or complete compared to the logs detailing the actions of developers on source code. In this paper, we trace two sources of information relative to software bugs: the change logs of the actions of developers and the issues reported as bugs. The aim is to identify and quantify the discrepancies between the two sources in recording and storing the developer logs relative to bugs. Focussing on the databases produced by two mining software repository tools, CVSAnalY and Bicho, we use part of the SZZ algorithm to identify bugs and to compare how the"defects-fixing changes" are recorded in the two databases. We use a working example to show how to do so. The results indicate that there is a significant amount of information, not in sync when tracing bugs in the two databases. We, therefore, propose an automatic approach to re-align the two databases, so that the collected information is mirrored and in sync.Dr. Felipe Orteg
Towards an automation of the traceability of bugs from development logs: A study based on open source software
Context: Information and tracking of defects can be severely incomplete in almost every Open Source project, resulting in a reduced traceability of defects into the development logs (i.e., version control commit logs). In particular, defect data often appears not in sync when considering what developers logged as their actions. Synchronizing or completing the missing data of the bug repositories, with the logs detailing the actions of developers, would benefit various branches of empirical software engineering research: prediction of software faults, software reliability, traceability, software quality, effort and cost estimation, bug prediction and bug fixing.
Objective: To design a framework that automates the process of synchronizing and filling the gaps of the development logs and bug issue data for open source software projects.
Method: We instantiate the framework with a sample of OSS projects from GitHub, and by parsing, linking and filling the gaps found in their bug issue data, and development logs. UML diagrams show the relevant modules that will be used to merge, link and connect the bug issue data with the development data.
Results: Analysing a sample of over 300 OSS projects we observed that around 1/2 of bug-related data is present in either development logs or issue tracker logs: the rest of the data is missing from one or the other source. We designed an automated approach that fills the gaps of either source by making use of the available data, and we successfully mapped all the missing data of the analysed projects, when using one heuristics of annotating bugs. Other heuristics need to be investigated and implemented.
Conclusion: In this paper a framework to synchronise the development logs and bug data used in empirical software engineering was designed to automatically fill the missing parts of development logs and bugs of issue data
Recommended from our members
Towards an Automation of the Traceability of Bugs from Development Logs
Context: Information and tracking of defects can be severely incomplete in almost every Open Source project, resulting in a reduced traceability of defects into the development logs (i.e., version control commit logs). In particular, defect data often appears not in sync when considering what developers logged as their actions. Synchronizing or completing the missing data of the bug repositories, with the logs detailing the actions of developers, would benefit various branches of empirical software engineering research: prediction of software faults, software reliability, traceability, software quality, effort and cost estimation, bug prediction and bug fixing. Objective: To design a framework that automates the process of synchronizing and filling the gaps of the development logs and bug issue data for open source software projects. Method: We instantiate the framework with a sample of OSS projects from GitHub, and by parsing, linking and filling the gaps found in their bug issue data, and development logs. UML diagrams show the relevant modules that will be used to merge, link and connect the bug issue data with the development data. Results: Analysing a sample of over 300 OSS projects we observed that around 1/2 of bug-related data is present in either development logs or issue tracker logs: the rest of the data is missing from one or the other source. We designed an automated approach that fills the gaps of either source by making use of the available data, and we successfully mapped all the missing data of the analysed projects, when using one heuristics of annotating bugs. Other heuristics need to be investigated and implemented. Conclusion: In this paper a framework to synchronise the development logs and bug data used in empirical software engineering was designed to automatically fill the missing parts of development logs and bugs of issue data
Recommended from our members
Analysing the Resolution of Security Bugs in Software Maintenance
Security bugs in software systems are often reported after incidents of malicious attacks. Developers often need to resolve these bugs quickly in order to maintain the security of such systems. Bug resolution includes two kinds of activities: triaging confirms that the bugs are indeed security problems, after which fixing involves making changes to the code.
It is reported in the literature that, statistically, security bugs are reopened more often compared to others, which poses two new research questions: (a) Are developers ârushingâ to triage security bugs too soon under the pressure of deadlines? (b) Do developers need to spend more time fixing security bugs to avoid frequent reopening?
This thesis explores these questions in order to determine whether security bug fixing should take a higher priority than other bugs to avoid malicious attackers exploiting vulnerabilities before the problems are fixed, and whether security bug fixing should take a higher priority than other bugs.
In this thesis a quantitative approach has been adopted by conducting statistical empirical studies to observe the behaviour of software developers engaged in dealing with security bugs.
Firstly, the concept of "rush'' has been borrowed from the time management literature to refer to the behaviour of people delivering work under the pressure of deadlines. By observing how developers deliver bug resolution before the deadline of releases, the degree of rush has been measured as the ratio between the actual time spent by developers during triaging and the theoretical time the developers have by delaying the fixes until the next regular release.
In this thesis, a suggest that delaying bug assignment helps find the right developer and gives the developer more time to prepare for the same workload with more relaxed planning constraints. Secondly, to analyse the complexity of security bug fixes, the fan-in complexity of functions relevant to security bugs has been measured, rather than simply measuring the time spent by the software developers on the fixing of such bugs.
The first null hypothesis is tested using a Man-Whitney method on five software case studies, Samba, MozillaFirefox, RedHat, FreeBSD and Mozilla. The second null hypothesis is tested by comparing the results of fixing security and non-security bugs from the Samba and MozillaFirefox case studies.
Statistically significant results suggest that security bugs are triaged in a rush compared to non-security bugs for RedHat, FreeBSD and Mozilla.
In terms of fan-in, the results of the Samba and MozillaFirefox case studies suggest that security bugs are more complex to fix compared to non-security bugs
Recommended from our members
Improving the quality of bug data in software repositories
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London.Context : Researchers have increasingly recognised the benefit of mining software repositories
to extract information. Thus, integrating a version control tool (VC tool) and bug tracking
tool (BT tool) in mining software repositories as well as synchronising missing bug tracking
data (BT data) and version control log (VC log) becomes of paramount importance, in order
to improve the quality of bug data in software repositories. In this way, researchers can do
good quality research for software project benefit especially in open source software projects
where information is limited in distributed development. Thus, shared data to track the issues
of the project are not common. BT data often appears not to be mirrored when considering
what developers logged as their actions, resulting in reduced traceability of defects in the
development logs (VC logs). VC system (Version control system) data can be enhanced with data from bug tracking system (BT system), because VC logs reports about past software development activities.
When these VC logs and BT data are used together, researchers can have a more complete
picture of a bugâs life cycle, evolution and maintenance. However, current BT system and
VC systems provide insufficient support for cross-analysis of both V Clogs and BT data for
researchers in empirical software engineering research: prediction of software faults, software
reliability, traceability, software quality, effort and cost estimation, bug prediction, and bug
fixing.
Aims and objectives: The aim of the thesis is to design and implement a tool chain to
support the integration of a VC tool and a BT tool, as well as to synchronise the missing VC
logs and BT data of open-source software projects automatically. The syncing process, using
Bicho (BT tool) and CVSAnalY (VC tool), will be demonstrated and evaluated on a sample
of 344 open source software (OSS) projects.
Method: The tool chain was implemented and its performance evaluated semi-automatically.
The SZZ algorithm approach was used to detect and trace BT data and VC logs. In its formulation, the algorithm looks for the terms "Bugs," or "Fixed" (case-insensitive) along with the â#â sign, that shows the ID of a bug in the VC system and BT system respectively. In i addition, the SZZ algorithm was dissected in its formulation and precision and recall analysed for the use of âfixâ, âbugâ or â# + digitâ (e.g., #1234), was detected was detected when tracking possible bug IDs from the VC logs of the sample OSS projects.
Results: The results of this analysis indicate that use of â# + digitâ (e.g., #1234) is more
precise for bug traceability than the use of the âbugâ and âfixâ keywords. Such keywords are
indeed present in the VC logs, but they are less useful when trying to connect the development
actions with the bug traces â that is, their recall is high. Overall, the results indicate that
VC log and BT data retrieved and stored by automatic tools can be tracked and recovered
with better accuracy using only a part of the SZZ algorithm. In addition, the results indicate
80-95% of all the missing BT data and VC logs for the 344 OSS projects has been synchronised
into Bicho and CVSAnalY database respectively.
Conclusion: The presented tool chain will eliminate and avoid repetitive activities in
traceability tasks, as well as software maintenance and evolution. This thesis provides a
solution towards the automation and traceability of BT data of software projects (in particular,
OSS projects) using VC logs to complement and track missing bug data.
Synchronising involves completing the missing data of bug repositories with the logs de
tailing the actions of developers. Synchronising benefit various branches of empirical software
engineering research: prediction of software faults, software reliability, traceability, software
quality, effort and cost estimation, bug prediction ,and bug fixing
Discovering Loners and Phantoms in Commit and Issue Data
The interlinking of commit and issue data has become a de-facto standard in software development. Modern issue tracking systems, such as JIRA, automatically interlink commits and issues by the extraction of identifiers (e.g., issue key) from commit messages. However, the conventions for the use of interlinking methodologies vary between software projects. For example, some projects enforce the use of identifiers for every commit while others have less restrictive conventions. In this work, we introduce a model called PaLiMod to enable the analysis of interlinking characteristics in commit and issue data. We surveyed 15 Apache projects to investigate differences and commonalities between linked and non-linked commits and issues. Based on the gathered information, we created a set of heuristics to interlink the residual of non-linked commits and issues. We present the characteristics of Loners and Phantoms in commit and issue data. The results of our evaluation indicate that the proposed PaLiMod model and heuristics enable an automatic interlinking and can indeed reduce the residual of non-linked commits and issues in software projects
Software Runtime Data: Visualization and Integration with Development Data â A Case Study
Tarkvara kvaliteet on tarkvaraarenduse protsessi ĂŒks peamisi aspekte. Kuigi tarkvaraarenduse ja kasutuse (kĂ€itusaja) protsessid toodavad erinevat tĂŒĂŒpi andmeid, on ettevĂ”tetel vĂ€he toetust, et saada Ă”igel ajal andmete pĂ”hjal arusaadavat ja tegutsema panevat teavet. Praktikud seisavad silmitsi tarkvaraprobleemide kindlakstegemise vĂ€ljakutsega varase tarkvaraarenduse etappide ajal. Magistritöö eesmĂ€rk oli pakkuda reaalajas tegutsevat teavet tarkvarasĂŒsteemide kasutamise ajal esinevate kĂ€itusvigade ja krahhide kohta ning uurida selle integreerimist arendusteabega. See töö on tehtud projekti Q-Rapids raames Fraunhoferi Eksperimentaalse Tarkvaratehnika Instituudis (IESE). Valitud juhtum on sise-nutika kĂŒla projekt - Digitale Dörfer (DD). Uurimistöö peamisteks panusteks on: a) DD projektist saadaolevate kĂ€itusaja andmete kogumine; b) sprintide planeerimise kĂ€igus otsuste tegemiseks juhtpaneelide loomine; c) CRISP-DM meetodi rakendamine tarkvara kĂ€itusaja ja arendusteabe integreerimiseks. Pakutavad ĂŒhendused ja integratsiooni skriptid on korduvkasutatavad. Edasisteks uuringuteks vĂ”ib kasutada kaudseid raskusi ja Ă”ppetunde, mis on saadud tarkvara kĂ€itusaja ja arendusteabe integreerimisest.Software quality is one of the key aspects of the software development process. Although software development and usage (runtime) processes produce a different type of data, there is little support for companies to obtain insightful and actionable information from data at the right time. Practitioners face a challenge in identifying software problems during the early software development stages. The goal of the master thesis was to provide actionable real-time information about runtime errors and crashes during the usage of software systems and explore its integration with development data. This work has been done within the project Q-Rapids at Fraunhofer IESE. The selected case is the internal smart village project - Digitale Dörfer (DD). The main contributions of the thesis are: a) collecting available runtime data from the DD the project; b) creating dashboards to make decisions during sprint planning; c) applying CRISP-DM method to the integration of software runtime and development data. The provided connectors and integration scripts are reusable. Reported challenges and lessons learned from the integration of software runtime and development data may be used for further research
Adopting Free/Libre/Open Source Software Practices, Techniques and Methods for Industrial Use
Todayâs software companies face the challenges of highly distributed development projects and constantly changing requirements. This paper proposes the adoption of relevant Free/Libre/Open Source Software (FLOSS) practices in order to improve software development projects in industry. Many FLOSS projects have proven to be very successful, producing high quality products with steady and frequent releases. This study aims to identify FLOSS practices that can be adapted for the corporate environment. To achieve this goal, a framework to compare FLOSS and industrial development methodologies was created. Three successful FLOSS projects were selected as study targets (the Linux Kernel, the FreeBSD operating system, and the JBoss application server), as well as two projects from Ericsson, a large telecommunications company. Based on an analysis of these projects, FLOSS best practices were tailored to fit industrial development environments. The final results consisted of a set of key adoption opportunities that aimed to improve software quality and overall development productivity by importing best practices from the FLOSS environment. The adoption opportunities were then validated at three large corporations
- âŠ