Search CORE

804 research outputs found

Knowledge acquisition for machine learning and analysis of software quality in GitHub

Author: Fernández-Grande David
Publication venue
Publication date: 30/08/2018
Field of study

Repository of the University of Namur

Recommended from our members

An empirical investigation into contributory factors of change and fault propensity in large-scale commercial object-oriented software

Author: Gatrell Matt
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2012
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityObject-Oriented design and development dominates both commercial and open source software projects. One of the principal goals of object-oriented design is to aid reuse, and hence, reduce future maintenance efforts of software systems. However, the on-going maintenance of large-scale software systems (both changes and faults) continues to be a significant proportion of the lifecycle of the system and the total investment cost. Understanding and thus being able to predict - or even reduce - the impact of the contributing factors of future maintenance efforts of a software system is thus highly beneficial to software practitioners. In this Thesis we empirically study a large, commercial software system with the principal aim to determine the contributing factors to the change and fault propensity over a three-year period. We consider the object-oriented design context of the software, specifically its inheritance characteristics, coupling and cohesion properties, object-oriented design pattern participation, and size. We also explore the effect of refactoring and test classes in the software. Our results show that several aspects of the design context of a class have an impact to the change and fault-proneness of the software. Specifically, we show that classes with high afferent or efferent coupling are more change and fault-prone; we also identify a number of design patterns whose participants tend to have a higher change and fault propensity than non-participants and we identify a range of inheritance characteristics (in terms of depth of inheritance and number of children) that result in an increase to change and fault-proneness. Furthermore we show that refactoring is a commonly occurring maintenance activity, although it is largely limited to simpler types of refactorings. Finally, we provide some insight into the co-evolution of production and test code during refactoring

Brunel University Research Archive

Explanatory and Causality Analysis in Software Engineering

Author: Alshehri Yasser Ali
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2018
Field of study

Software fault proneness and software development efforts are two key areas of software engineering. Improving them will significantly reduce the cost and promote good planning and practice in developing and managing software projects. Traditionally, studies of software fault proneness and software development efforts were focused on analysis and prediction, which can help to answer questions like `when’ and `where’. The focus of this dissertation is on explanatory and causality studies that address questions like `why’ and `how’. First, we applied a case-control study to explain software fault proneness. We found that Bugfixes (Prerelease bugs), Developers, Code Churn, and Age of a file are the main contributors to the Postrelease bugs in some of the open-source projects. In terms of the interactions, we found that Bugfixes and Developers reduced the risk of post release software faults. The explanatory models were tested for prediction and their performance was either comparable or better than the top-performing classifiers used in related studies. Our results indicate that software project practitioners should pay more attention to the prerelease bug fixing process and the number of Developers assigned, as well as their interaction. Also, they need to pay more attention to the new files (less than one year old) which contributed significantly more to Postrelease bugs more than old files. Second, we built a model that explains and predicts multiple levels of software development effort and measured the effects of several metrics and their interactions using categorical regression models. The final models for the three data sets used were statistically fit, and performance was comparable to related studies. We found that project size, duration, the existence of any type of faults, the use of first- or second generation of programming languages, and team size significantly increased the software development effort. On the other side, the interactions between duration and defective project, and between duration and team size reduced the software development effort. These results suggest that software practitioners should pay extra attention to the time of the project and the team size assigned for every task because when they increased from a low to a higher level, they significantly increased the software development effort. Third, a structural equation modeling method was applied for causality analysis of software fault proneness. The method combined statistical and regression analysis to find the direct and indirect causes for software faults using partial least square path modeling method. We found direct and indirect paths from measurement models that led to software postrelease bugs. Specifically, the highest direct effect came from the change request, while changing the code had a minor impact on software faults. The highest impact of the code change resulted from the change requests (either for bug fixing or refactoring). Interestingly, the indirect impact from code characteristics to software fault proneness was higher than the direct impact. We found a similar level of direct and indirect impact from code characteristics to code change

The Research Repository @ WVU (West Virginia University)

Quality Assessment and Prediction in Software Product Lines

Author: Devine Thomas Ryan
Publication venue: The Research Repository @ WVU
Publication date: 01/05/2013
Field of study

At the heart of product line development is the assumption that through structured reuse later products will be of a higher quality and require less time and effort to develop and test. This thesis presents empirical results from two case studies aimed at assessing the quality aspect of this claim and exploring fault prediction in the context of software product lines. The first case study examines pre-release faults and change proneness of four products in PolyFlow, a medium-sized, industrial software product line; the second case study analyzes post-release faults using pre-release data over seven releases of four products in Eclipse, a very large, open source software product line.;The goals of our research are (1) to determine the association between various software metrics, as well as their correlation with the number of faults at the component/package level; (2) to characterize the fault and change proneness of components/packages at various levels of reuse; (3) to explore the benefits of the structured reuse found in software product lines; and (4) to evaluate the effectiveness of predictive models, built on a variety of products in a software product line, to make accurate predictions of pre-release software faults (in the case of PolyFlow) and post-release software faults (in the case of Eclipse).;The research results of both studies confirm, in a software product line setting, the findings of others that faults (both pre- and post-release) are more highly correlated to change metrics than to static code metrics, and are mostly contained in a small set of components/ packages. The longitudinal aspect of our research indicates that new products do benefit from the development and testing of previous products. The results also indicate that pre-existing components/packages, including the common components/packages, undergo continuous change, but tend to sustain low fault densities. However, this is not always true for newly developed components/packages. Finally, the results also show that predictions of pre-release faults in the case of PolyFlow and post-release faults in the case of Eclipse can be done accurately from pre-release data, and furthermore, that these predictions benefit from information about additional products in the software product lines

The Research Repository @ WVU (West Virginia University)

Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

Author: Mpofu Bongeka
Publication venue
Publication date: 01/12/2018
Field of study

Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

Unisa Institutional Repository

Model-based risk assessment

Author: Abdelmoez Walid M.
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2006
Field of study

In this research effort, we focus on model-based risk assessment. Risk assessment is essential in any plan intended to manage software development or maintenance process. Subjective techniques are human intensive and error-prone. Risk assessment should be based on architectural attributes that we can quantitatively measure using architectural level metrics. Software architectures are emerging as an important concept in the study and practice of software engineering nowadays, due to their emphasis on large-scale composition of software product, and to their support for emerging software engineering paradigms, such as product line engineering, component based software engineering, and software evolution.;In this dissertation, we generalize our earlier work on reliability-based risk assessment. We introduce error propagation probability in the assessment methodology to account for the dependency among the system components. Also, we generalize the reliability-based risk assessment to account for inherent functional dependencies.;Furthermore, we develop a generic framework for maintainability-based risk assessment which can accommodate different types of software maintenance. First, we introduce and define maintainability-based risk assessment for software architecture. Within our assessment framework, we investigate the maintainability-based risk for the components of the system, and the effect of performing the maintenance tasks on these components. We propose a methodology for estimating the maintainability-based risk when considering different types of maintenance. As a proof of concept, we apply the proposed methodology on several case studies. Moreover, we automate the estimation of the maintainability-based risk assessment methodology

The Research Repository @ WVU (West Virginia University)

Are Multi-language Design Smells Fault-prone? An Empirical Study

Author: Abidi Mouna
Khomh Foutse
Openja Moses
Rahman Md Saidur
Publication venue
Publication date: 02/11/2020
Field of study

Nowadays, modern applications are developed using components written in different programming languages. These systems introduce several advantages. However, as the number of languages increases, so does the challenges related to the development and maintenance of these systems. In such situations, developers may introduce design smells (i.e., anti-patterns and code smells) which are symptoms of poor design and implementation choices. Design smells are defined as poor design and coding choices that can negatively impact the quality of a software program despite satisfying functional requirements. Studies on mono-language systems suggest that the presence of design smells affects code comprehension, thus making systems harder to maintain. However, these studies target only mono-language systems and do not consider the interaction between different programming languages. In this paper, we present an approach to detect multi-language design smells in the context of JNI systems. We then investigate the prevalence of those design smells. Specifically, we detect 15 design smells in 98 releases of nine open-source JNI projects. Our results show that the design smells are prevalent in the selected projects and persist throughout the releases of the systems. We observe that in the analyzed systems, 33.95% of the files involving communications between Java and C/C++ contains occurrences of multi-language design smells. Some kinds of smells are more prevalent than others, e.g., Unused Parameters, Too Much Scattering, Unused Method Declaration. Our results suggest that files with multi-language design smells can often be more associated with bugs than files without these smells, and that specific smells are more correlated to fault-proneness than others

arXiv.org e-Print Archive

PolyPublie

Defining linguistic antipatterns towards the improvement of source code quality

Author: Arnaoudova Venera
Publication venue
Publication date: 01/09/2010
Field of study

Previous studies showed that linguistic aspect of source code is a valuable source of information that can help to improve program comprehension. The proposed research work focuses on supporting quality improvement of source code by identifying, specifying, and studying common negative practices (i.e., linguistic antipatterns) with respect to linguistic information. We expect the definition of linguistic antipatterns to increase the awareness of the existence of such bad practices and to discourage their use. We also propose to study the relation between negative practices in linguistic information (i.e., linguistic antipatterns) and negative practices in structural information (i.e., design antipatterns) with respect to comprehension effort and fault/change proneness. We discuss the proposed methodology and some preliminary results

PolyPublie