100,139 research outputs found
Towards an automation of the traceability of bugs from development logs: A study based on open source software
Context: Information and tracking of defects can be severely incomplete in almost every Open Source project, resulting in a reduced traceability of defects into the development logs (i.e., version control commit logs). In particular, defect data often appears not in sync when considering what developers logged as their actions. Synchronizing or completing the missing data of the bug repositories, with the logs detailing the actions of developers, would benefit various branches of empirical software engineering research: prediction of software faults, software reliability, traceability, software quality, effort and cost estimation, bug prediction and bug fixing.
Objective: To design a framework that automates the process of synchronizing and filling the gaps of the development logs and bug issue data for open source software projects.
Method: We instantiate the framework with a sample of OSS projects from GitHub, and by parsing, linking and filling the gaps found in their bug issue data, and development logs. UML diagrams show the relevant modules that will be used to merge, link and connect the bug issue data with the development data.
Results: Analysing a sample of over 300 OSS projects we observed that around 1/2 of bug-related data is present in either development logs or issue tracker logs: the rest of the data is missing from one or the other source. We designed an automated approach that fills the gaps of either source by making use of the available data, and we successfully mapped all the missing data of the analysed projects, when using one heuristics of annotating bugs. Other heuristics need to be investigated and implemented.
Conclusion: In this paper a framework to synchronise the development logs and bug data used in empirical software engineering was designed to automatically fill the missing parts of development logs and bugs of issue data
Recommended from our members
Towards an Automation of the Traceability of Bugs from Development Logs
Context: Information and tracking of defects can be severely incomplete in almost every Open Source project, resulting in a reduced traceability of defects into the development logs (i.e., version control commit logs). In particular, defect data often appears not in sync when considering what developers logged as their actions. Synchronizing or completing the missing data of the bug repositories, with the logs detailing the actions of developers, would benefit various branches of empirical software engineering research: prediction of software faults, software reliability, traceability, software quality, effort and cost estimation, bug prediction and bug fixing. Objective: To design a framework that automates the process of synchronizing and filling the gaps of the development logs and bug issue data for open source software projects. Method: We instantiate the framework with a sample of OSS projects from GitHub, and by parsing, linking and filling the gaps found in their bug issue data, and development logs. UML diagrams show the relevant modules that will be used to merge, link and connect the bug issue data with the development data. Results: Analysing a sample of over 300 OSS projects we observed that around 1/2 of bug-related data is present in either development logs or issue tracker logs: the rest of the data is missing from one or the other source. We designed an automated approach that fills the gaps of either source by making use of the available data, and we successfully mapped all the missing data of the analysed projects, when using one heuristics of annotating bugs. Other heuristics need to be investigated and implemented. Conclusion: In this paper a framework to synchronise the development logs and bug data used in empirical software engineering was designed to automatically fill the missing parts of development logs and bugs of issue data
TREEOME: A framework for epigenetic and transcriptomic data integration to explore regulatory interactions controlling transcription
Motivation: Predictive modelling of gene expression is a powerful framework
for the in silico exploration of transcriptional regulatory interactions
through the integration of high-throughput -omics data. A major limitation of
previous approaches is their inability to handle conditional and synergistic
interactions that emerge when collectively analysing genes subject to different
regulatory mechanisms. This limitation reduces overall predictive power and
thus the reliability of downstream biological inference.
Results: We introduce an analytical modelling framework (TREEOME: tree of
models of expression) that integrates epigenetic and transcriptomic data by
separating genes into putative regulatory classes. Current predictive modelling
approaches have found both DNA methylation and histone modification epigenetic
data to provide little or no improvement in accuracy of prediction of
transcript abundance despite, for example, distinct anti-correlation between
mRNA levels and promoter-localised DNA methylation. To improve on this, in
TREEOME we evaluate four possible methods of formulating gene-level DNA
methylation metrics, which provide a foundation for identifying gene-level
methylation events and subsequent differential analysis, whereas most previous
techniques operate at the level of individual CpG dinucleotides. We demonstrate
TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone
modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript
abundance (RNA-seq) for H1-hESC and GM12878 cell lines.
Availability: TREEOME is implemented using open-source software and made
available as a pre-configured bootable reference environment. All scripts and
data presented in this study are available online at
http://sourceforge.net/projects/budden2015treeome/.Comment: 14 pages, 6 figure
An Empirical analysis of Open Source Software Defects data through Software Reliability Growth Models
The purpose of this study is to analyze the reliability growth of Open Source Software (OSS) using Software Reliability Growth Models (SRGM). This study uses defects data of twenty five different releases of five OSS projects. For each release of the selected projects two types of datasets have been created; datasets developed with respect to defect creation date (created date DS) and datasets developed with respect to defect updated date (updated date DS). These defects datasets are modelled by eight SRGMs; Musa Okumoto, Inflection S-Shaped, Goel Okumoto, Delayed S-Shaped, Logistic, Gompertz, Yamada Exponential, and Generalized Goel Model. These models are chosen due to their widespread use in the literature. The SRGMs are fitted to both types of defects datasets of each project and the their fitting and prediction capabilities are analysed in order to study the OSS reliability growth with respect to defects creation and defects updating time because defect analysis can be used as a constructive reliability predictor. Results show that SRGMs fitting capabilities and prediction qualities directly increase when defects creation date is used for developing OSS defect datasets to characterize the reliability growth of OSS. Hence OSS reliability growth can be characterized with SRGM in a better way if the defect creation date is taken instead of defects updating (fixing) date while developing OSS defects datasets in their reliability modellin
Clang and Coccinelle:synergising program analysis tools for CERT C Secure Coding Standard certification
Writing correct C programs is well-known to be hard, not least due to the many language features intrinsic to C. Writing secure C programs is even harder and, at times, seemingly impossible. To improve on this situation the US CERT has developed and published a set of coding standards, the âCERT C Secure Coding Standardâ, that (in the current version) enumerates 118 rules and 182 recommendations with the aim of making C programs (more) secure. The large number of rules and recommendations makes automated tool support essential for certifying that a given system is in compliance with the standard.
In this paper we report on ongoing work on integrating two state of the art analysis tools, Clang and Coccinelle, into a combined tool well suited for analysing and certifying C programs according to, e.g., the CERT C Secure Coding standard or the MISRA (the Motor Industry Software Reliability Assocation) C standard. We further argue that such a tool must be highly adaptable and customisable to each software project as well as to the certification rules required by a given standard.
Clang is the C frontend for the LLVM compiler/virtual machine project which includes a comprehensive set of static analyses and code checkers. Coccinelle is a program transformation tool and bug-finder developed originally for the Linux kernel,
but has been successfully used to find bugs in other Open Source projects such as WINE and OpenSSL
Clang and Coccinelle: Synergising program analysis tools for CERT C Secure Coding Standard certification
Writing correct C programs is well-known to be hard, not least due to the many language features intrinsic to C. Writing secure C programs is even harder and, at times, seemingly impossible. To improve on this situation the US CERT has developed and published a set of coding standards, the âCERT C Secure Coding Standardâ, that (in the current version) enumerates 118 rules and 182 recommendations with the aim of making C programs (more) secure. The large number of rules and recommendations makes automated tool support essential for certifying that a given system is in compliance with the standard.
In this paper we report on ongoing work on integrating two state of the art analysis tools, Clang and Coccinelle, into a combined tool well suited for analysing and certifying C programs according to, e.g., the CERT C Secure Coding standard or the MISRA (the Motor Industry Software Reliability Assocation) C standard. We further argue that such a tool must be highly adaptable and customisable to each software project as well as to the certification rules required by a given standard.
Clang is the C frontend for the LLVM compiler/virtual machine project which includes a comprehensive set of static analyses and code checkers. Coccinelle is a program transformation tool and bug-finder developed originally for the Linux kernel,
but has been successfully used to find bugs in other Open Source projects such as WINE and OpenSSL
Recommended from our members
Understanding construction delay analysis and the role of pre-construction programming
Copyright © 2013, American Society of Civil Engineers. This is the author's accepted manuscript. The final published article is available from the link below.Modern construction projects commonly suffer from delay in their completions. The resolution of time and cost claims consequently flowing from such delays continues to remain a difficult undertaking for all project parties. A common approach often relied on by contractors and their employers (or their representatives) to resolve this matter involves applying various delay analysis techniques, which are all based on construction programs originally developed for managing the project. However, evidence from literature suggests that the reliability of these techniques in ensuring successful claims resolution are often undermined by the nature and quality of the underlying program used. As part of a wider research carried out on delay and disruption analysis in practice, this paper reports on an aspect of the study aimed at exploring preconstruction stage programming issues that affect delay claims resolutions. This aspect is based on an in-depth interview with experienced construction planning engineers in the United Kingdom, conducted after an initial large-scale survey on delay and disruption techniques usage. Summary of key findings and conclusions include: (1) most contractors prefer to use linked bar chart format for their baseline programs over conventional critical path method (CPM) networks; (2) baseline programs are developed using planning software packages. Some of these pose difficulties when employed for most delay analysis techniques, except for simpler ones; (3) manpower loading graphs are not commonly developed as part of the main deliverables during preconstruction stage planning. As a result, most programs are not subjected to resource loading and leveling for them to accurately reflect planned resource usage on site. This practice has detrimental effects on the reliability of baseline programs in their use for resolving delay claims; and (4) baseline program development involves many different experts within construction organizations as expected, but with very little involvement of the employer or its representative. Active client involvement is however quite important as it would facilitate quick program approval/acceptance before construction, a necessary requirement for early delay claims settlement, which otherwise are often left unresolved long after the delaying events with the potential of generating into expensive disputes. The study results provide a better understanding of the key issues that need attention if improvements are to be made in delay claim resolutions. Additional research focusing on the testing of these results using a much larger sample and rigorous statistical analysis for generalization purposes would be helpful in advancing the limited knowledge of this subject matter
How can SMEs benefit from big data? Challenges and a path forward
Big data is big news, and large companies in all sectors are making significant advances in their customer relations, product selection and development and consequent profitability through using this valuable commodity. Small and medium enterprises (SMEs) have proved themselves to be slow adopters of the new technology of big data analytics and are in danger of being left behind. In Europe, SMEs are a vital part of the economy, and the challenges they encounter need to be addressed as a matter of urgency. This paper identifies barriers to SME uptake of big data analytics and recognises their complex challenge to all stakeholders, including national and international policy makers, IT, business management and data science communities.
The paper proposes a big data maturity model for SMEs as a first step towards an SME roadmap to data analytics. It considers the âstate-of-the-artâ of IT with respect to usability and usefulness for SMEs and discusses how SMEs can overcome the barriers preventing them from adopting existing solutions. The paper then considers management perspectives and the role of maturity models in enhancing and structuring the adoption of data analytics in an organisation. The history of total quality management is reviewed to inform the core aspects of implanting a new paradigm. The paper concludes with recommendations to help SMEs develop their big data capability and enable them to continue as the engines of European industrial and business success. Copyright © 2016 John Wiley & Sons, Ltd.Peer ReviewedPostprint (author's final draft
Measuring the Quality of Machine Learning and Optimization Frameworks
Software frameworks are daily and extensively used in research, both for fundamental studies and applications. Researchers usually trust in the quality of these frameworks without any evidence that they are correctly build, indeed they could contain some defects that potentially could affect to thousands of already published and future papers. Considering the important role of these frameworks in the current state-of-the-art in research, their quality should be quantified to show the weaknesses and strengths of each software package.
In this paper we study the main static quality properties, defined in the product quality model proposed by the ISO 25010 standard, of ten well-known frameworks. We provide a quality rating for each characteristic depending on the severity of the issues detected in the analysis. In addition, we propose an overall quality rating of 12 levels (ranging from A+ to D-) considering the ratings of all characteristics. As a result,
we have data evidence to claim that the analysed frameworks are not in a good shape, because the best overall rating is just a C+ for Mahout
framework, i.e., all packages need to go for a revision in the analysed features. Focusing on the characteristics individually, maintainability is
by far the one which needs the biggest effort to fix the found defects. On the other hand, performance obtains the best average rating, a result
which conforms to our expectations because frameworksâ authors used to take care about how fast their software runs.University of Malaga. Campus de Excelencia Internacional AndalucĂa Tech.
We would like to say thank you to all authors of these frameworks that make research easier for all of us. This research has been partially funded by CELTIC C2017/2-2 in collaboration with companies EMERGYA and SECMOTIC with contracts #8.06/5.47.4997 and #8.06/5.47.4996. It has also been funded by the Spanish Ministry of Science and Innovation and /Junta de AndalucıÌa/FEDER under contracts TIN2014-57341-R and TIN2017-88213-R, the network of smart cities CI-RTI (TIN2016-81766-REDT
The Co-Evolution of Test Maintenance and Code Maintenance through the lens of Fine-Grained Semantic Changes
Automatic testing is a widely adopted technique for improving software
quality. Software developers add, remove and update test methods and test
classes as part of the software development process as well as during the
evolution phase, following the initial release. In this work we conduct a large
scale study of 61 popular open source projects and report the relationships we
have established between test maintenance, production code maintenance, and
semantic changes (e.g, statement added, method removed, etc.). performed in
developers' commits.
We build predictive models, and show that the number of tests in a software
project can be well predicted by employing code maintenance profiles (i.e., how
many commits were performed in each of the maintenance activities: corrective,
perfective, adaptive). Our findings also reveal that more often than not,
developers perform code fixes without performing complementary test maintenance
in the same commit (e.g., update an existing test or add a new one). When
developers do perform test maintenance, it is likely to be affected by the
semantic changes they perform as part of their commit.
Our work is based on studying 61 popular open source projects, comprised of
over 240,000 commits consisting of over 16,000,000 semantic change type
instances, performed by over 4,000 software engineers.Comment: postprint, ICSME 201
- âŠ