2,123 research outputs found

    Explanatory and Causality Analysis in Software Engineering

    Get PDF
    Software fault proneness and software development efforts are two key areas of software engineering. Improving them will significantly reduce the cost and promote good planning and practice in developing and managing software projects. Traditionally, studies of software fault proneness and software development efforts were focused on analysis and prediction, which can help to answer questions like `when’ and `where’. The focus of this dissertation is on explanatory and causality studies that address questions like `why’ and `how’. First, we applied a case-control study to explain software fault proneness. We found that Bugfixes (Prerelease bugs), Developers, Code Churn, and Age of a file are the main contributors to the Postrelease bugs in some of the open-source projects. In terms of the interactions, we found that Bugfixes and Developers reduced the risk of post release software faults. The explanatory models were tested for prediction and their performance was either comparable or better than the top-performing classifiers used in related studies. Our results indicate that software project practitioners should pay more attention to the prerelease bug fixing process and the number of Developers assigned, as well as their interaction. Also, they need to pay more attention to the new files (less than one year old) which contributed significantly more to Postrelease bugs more than old files. Second, we built a model that explains and predicts multiple levels of software development effort and measured the effects of several metrics and their interactions using categorical regression models. The final models for the three data sets used were statistically fit, and performance was comparable to related studies. We found that project size, duration, the existence of any type of faults, the use of first- or second generation of programming languages, and team size significantly increased the software development effort. On the other side, the interactions between duration and defective project, and between duration and team size reduced the software development effort. These results suggest that software practitioners should pay extra attention to the time of the project and the team size assigned for every task because when they increased from a low to a higher level, they significantly increased the software development effort. Third, a structural equation modeling method was applied for causality analysis of software fault proneness. The method combined statistical and regression analysis to find the direct and indirect causes for software faults using partial least square path modeling method. We found direct and indirect paths from measurement models that led to software postrelease bugs. Specifically, the highest direct effect came from the change request, while changing the code had a minor impact on software faults. The highest impact of the code change resulted from the change requests (either for bug fixing or refactoring). Interestingly, the indirect impact from code characteristics to software fault proneness was higher than the direct impact. We found a similar level of direct and indirect impact from code characteristics to code change

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

    Machine Learning based prediction of the effect of lay-up defects in the automated fiber placement

    Get PDF
    The use of Automated Fiber Placement is being widespread in the aerospace industry. The need of manufacturing large and complex structural composite components, it makes the use of this technology much more efficient than the conventional hand lay-up manufacturing. However, these components are still being manually inspected and the effect of the defects found is calculated with a simulation software. The scope of this thesis is to create a Machine Learning model that is able to calculate the effect on the effective stiffness for different defect configuration. This Machine Learning model should be provided with the geometrical defect characteristics in the laminate and it has to be able to predict, with a high level of accuracy, the effective stiffness of the laminate. Training this model with a big amount of different configuration defects generates the need to create a parametrized FE model of a composite laminate on the coupon level. The results show that a Multi Layer Perceptron architecture with two hidden layers. The first one with 281 nodes and the second one with 76 nodes which is able to predict the effective stiffness of a defective laminate coupon with an accuracy of 0,1 GPaL'ús del Automated Fiber Placement està estenent-se en la indústria aeroespacial. La necessitat de fabricar components estructurals compostos grans i complexes, fa que l'ús d'aquesta tecnologia sigui molt més eficient que la fabricació convencional amb col·locació manual. No obstant això, aquests components encara s'estan inspeccionant manualment i es calcula l'efecte dels defectes trobats amb software de simulació. L'abast d'aquesta tesi és crear un model de Machine Lerning que sigui capaç de calcular l'efecte en la rigidesa efectiva per diferents configuracions de defectes. Aquest model d'aprenentatge automàtic hauria de rebre les característiques geomètriques dels defectes en el laminat i de ser capaç de predir, amb un alt nivell de precisió, la rigidesa efectiva del laminat. Entrenar aquest model amb una gran quantitat de configuracions de defectes diferents genera la necessitat de crear un model FE parametritzat d'una laminació composta en el nivell de cupó. Els resultats mostren que una arquitectura de Multilayer Perceptron amb dues hidden layers. La primera amb 281 nodes i la segona amb 76 nodes, és capaç de predir la rigidesa efectiva d'un laminat defectuós amb una precisió de 0,1 GPaEl uso del Automated Fiber Placement se está expandiendo en la industria aeroespacial. La necesidad de fabricar grandes y complejos componentes estructurales de materiales compuestos, hace que el uso de esta tecnología sea mucho más eficiente que la fabricación manual convencional. Sin embargo, estos componentes siguen siendo inspeccionados manualmente y se calcula el efecto de los defectos encontrados con un software de simulación. El objetivo de esta tesis es crear un modelo de Machine Learning que sea capaz de calcular el efecto sobre la rigidez efectiva para diferentes configuraciones de defectos. A este modelo de aprendizaje automático se le deben proporcionar las características geométricas del defecto en el laminado y tiene que ser capaz de predecir, con un alto nivel de precisión, la rigidez efectiva del laminado. El entrenamiento de este modelo se debe de realizar con una gran cantidad de configuraciones de defectos diferentes. Este hecho genera la necesidad de crear un modelo de elementos finitos parametrizado de un laminado a nivel de cupón. Los resultados muestran que una arquitectura Multilayer Perceptron con dos hidden layers. La primera con 281 nodos y la segunda con 76 nodos que es capaz de predecir la rigidez efectiva de un coupon laminado defectuoso con una precisión de 0,1 GP

    Multiscale Machine Learning and Numerical Investigation of Ageing in Infrastructures

    Get PDF
    Infrastructure is a critical component of a country’s economic growth. Interaction with extreme service environments can adversely affect the long-term performance of infrastructure and accelerate ageing. This research focuses on using machine learning to improve the efficiency of analysing the multiscale ageing impact on infrastructure. First, a data-driven campaign is developed to analyse the condition of an ageing infrastructure. A machine learning-based framework is proposed to predict the state of various assets across a railway system. The ageing of the bond in fibre-reinforced polymer (FRP)-strengthened concrete elements is investigated using machine learning. Different machine learning models are developed to characterise the long-term performance of the bond. The environmental ageing of composite materials is investigated by a micromechanics-based machine learning model. A mathematical framework is developed to automatically generate microstructures. The microstructures are analysed by the finite element (FE) method. The generated data is used to develop a machine learning model to study the degradation of the transverse performance of composites under humid conditions. Finally, a multiscale FE and machine learning framework is developed to expand the understanding of composite material ageing. A moisture diffusion analysis is performed to simulate the water uptake of composites under water immersion conditions. The results are downscaled to obtain micromodel stress fields. Numerical homogenisation is used to obtain the composite transverse behaviour. A machine learning model is developed based on the multiscale simulation results to model the ageing process of composites under water immersion. The frameworks developed in this thesis demonstrate how machine learning improves the analysis of ageing across multiple scales of infrastructure. The resulting understanding can help develop more efficient strategies for the rehabilitation of ageing infrastructure

    Semi-supervised and Active Learning Models for Software Fault Prediction

    Get PDF
    As software continues to insinuate itself into nearly every aspect of our life, the quality of software has been an extremely important issue. Software Quality Assurance (SQA) is a process that ensures the development of high-quality software. It concerns the important problem of maintaining, monitoring, and developing quality software. Accurate detection of fault prone components in software projects is one of the most commonly practiced techniques that offer the path to high quality products without excessive assurance expenditures. This type of quality modeling requires the availability of software modules with known fault content developed in similar environment. However, collection of fault data at module level, particularly in new projects, is expensive and time-consuming. Semi-supervised learning and active learning offer solutions to this problem for learning from limited labeled data by utilizing inexpensive unlabeled data.;In this dissertation, we investigate semi-supervised learning and active learning approaches in the software fault prediction problem. The role of base learner in semi-supervised learning is discussed using several state-of-the-art supervised learners. Our results showed that semi-supervised learning with appropriate base learner leads to better performance in fault proneness prediction compared to supervised learning. In addition, incorporating pre-processing technique prior to semi-supervised learning provides a promising direction to further improving the prediction performance. Active learning, sharing the similar idea as semi-supervised learning in utilizing unlabeled data, requires human efforts for labeling fault proneness in its learning process. Empirical results showed that active learning supplemented by dimensionality reduction technique performs better than the supervised learning on release-based data sets

    Supplier Selection and Relationship Management: An Application of Machine Learning Techniques

    Get PDF
    Managing supply chains is an extremely challenging task due to globalization, short product life cycle, and recent advancements in information technology. These changes result in the increasing importance of managing the relationship with suppliers. However, the supplier selection literature mainly focuses on selecting suppliers based on previous performance, environmental and social criteria and ignores supplier relationship management. Moreover, although the explosion of data and the capabilities of machine learning techniques in handling dynamic and fast changing environment show promising results in customer relationship management, especially in customer lifetime value, this area has been untouched in the upstream side of supply chains. This research is an attempt to address this gap by proposing a framework to predict supplier future value, by incorporating the contract history data, relationship value, and supply network properties. The proposed model is empirically tested for suppliers of public works and government services Canada. Methodology wise, this thesis demonstrates the application of machine learning techniques for supplier selection and developing effective strategies for managing relationships. Practically, the proposed framework equips supply chain managers with a proactive and forward-looking approach for managing supplier relationship

    Machine Learning Approach for Risk-Based Inspection Screening Assessment

    Get PDF
    Risk-based inspection (RBI) screening assessment is used to identify equipment that makes a significant contribution to the system's total risk of failure (RoF), so that the RBI detailed assessment can focus on analyzing higher-risk equipment. Due to its qualitative nature and high dependency on sound engineering judgment, screening assessment is vulnerable to human biases and errors, and thus subject to output variability and threatens the integrity of the assets. This paper attempts to tackle these challenges by utilizing a machine learning approach to conduct screening assessment. A case study using a dataset of RBI assessment for oil and gas production and processing units is provided, to illustrate the development of an intelligent system, based on a machine learning model for performing RBI screening assessment. The best performing model achieves accuracy and precision of 92.33% and 84.58%, respectively. A comparative analysis between the performance of the intelligent system and the conventional assessment is performed to examine the benefits of applying the machine learning approach in the RBI screening assessment. The result shows that the application of the machine learning approach potentially improves the quality of the conventional RBI screening assessment output by reducing output variability and increasing accuracy and precision.acceptedVersio
    • …
    corecore