276 research outputs found

    An Empirical Study of Classifier Combination on Cross-Project Defect Prediction

    Get PDF
    Abstract—To help developers better allocate testing and de-bugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on past history of buggy classes. These techniques work well as long as a sufficient amount of data is available to train a prediction model. However, there is rarely enough training data for new software projects. To deal with this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, has been proposed and is regarded as a new challenge for defect prediction. So far, only a few cross-project defect prediction techniques have been proposed. To advance the state-of-the-art, in this work, we investigate 7 composite algorithms, which integrate multiple machine learning classifiers, to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we perform exper-iments on 10 open source software systems from the PROMISE repository which contain a total of 5,305 instances labeled as defective or clean. We compare the composite algorithms with CODEPLogistic, which is the latest cross-project defect prediction algorithm proposed by Panichella et al. [1], in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experiment results show that several algorithms outperform CODEPLogistic: Max performs the best in terms of F-measure and its average F-measure outperforms that of CODEPLogistic by 36.88%. BaggingJ48 performs the best in terms of cost effectiveness and its average cost effectiveness outperforms that of CODEPLogistic by 15.34%

    Data Mining

    Get PDF
    Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment

    Detection and Analysis of Molten Aluminium Cleanliness Using a Pulsed Ultrasound System

    Get PDF
    This document presents the development of a solution for analysis and detection of molten metal quality deviations. The data is generated by an MV20/20, an ultrasound sensor that detects inclusions - molten metal defects that affect the quality of the product. The data is then labelled by assessing the sample using metallography. The analysis provides the sample outcome and dominant inclusion. The business objectives for the project include the real-time classification of anomalous events by means of a supervised classifier for the metal quality outcome, and a classifier for the inclusion type responsible for low quality. The adopted methodology involves descriptive, diagnostic and predictive analytics. Once the data is statistically profiled, it is standardised and scaled to unit variance in order to compensate for different units in the descriptors. Principal components analysis is applied as a dimensionality reduction technique, and it is found that the first three components account for 99.6% of the variance of the dataset. In order for the system to have predictive ability, two modelling approaches are considered, namely Response Surface Methodology and supervised machine learning. Supervised machine learning is preferred as it offers more flexibility than a polynomial approximator, and it is more accurate. Four classifiers are built, namely logistic regression, support vector machine, multi-layer perceptron and a radial basis function network. The hyperparameters are tuned using 10- fold repeated cross-validation. The multi-layer perceptron offers the best performance in all cases. For determining the quality outcome of a cast (passed or failed), all the models perform according to business targets for accuracy, precision, sensitivity and specificity. For the inclusion type classification, the multi-layer perceptron performs within 5% of the target metrics. In order to optimise the model, a grid search is performed for optimal parameter tuning. The results offer negligible improvement, which indicates that the model has reached a global maximum in the parameter optimisation in the hyperspace. It is noted that the source of variance in the inclusion type data respondent is attributed to operator error during labelling of the dataset, among several other sources of variance. It is therefore recommended that a Gage R&R be performed in order to identify sources of variation, among other improvement recommendations. From a research perspective, a vision system is recommended for assessing metal colour, texture and other visual properties in order to provide more insights. Another possible research extension recommended is the use of Fourier Transform Infrared Spectroscopy in determining signatures of the clean metal and different inclusions for detection. The project is regarded as a success, as the business metrics are met by the solution

    Quantifying the Impact of Change Orders on Construction Labor Productivity Using System Dynamics

    Get PDF
    Researchers and industry practitioners agree that changes are unavoidable in construction projects and may become troublesome if poorly managed. One of the root causes of sub-optimal productivity in construction projects is the number and impact of changes introduced to the initial scope of work during the course of project execution. In labor-intensive construction projects, labor costs represent a substantial percentage of the total project budget. Understanding labor productivity is essential to project success. If productivity is impacted by any reasons such as extensive changes or poor managerial policies, labor costs will increase over and above planned cost. The true challenge of change management is having a comprehensive understanding of change impacts and how these impacts can be reduced or prevented before they cascade forming serious problems. This thesis proposes a change management framework that project teams can use to quantify labor productivity losses due to change orders and managerial policies across all phases of construction projects. The proposed framework has three models; fuzzy risk-based change management, AI baseline-productivity estimating, and system dynamics to illustrate cause-impact relationships. These models were developed in five stages. In the first stage, the fuzzy risk-based change management (FRCM) model was developed to prioritize change orders in a way that only essential change orders can be targeted. In this stage, Fuzzy Analytic Hierarchy Process (F-AHP) and Hierarchical Fuzzy Inference System are utilized to calculate relative weights of the factors considered and generate a score for each contemplated change. In the second stage, baseline productivity model was developed considering a set of environmental and operational variables. In this step, various techniques were used including Stepwise, Best Subset, Evolutionary Polynomial Regression (EPR), General Regression Neural Network (GRNN), Artificial Neural Network (ANN), Radial Basis Function Neural Network (RBFNN), and Adaptive Neuro Fuzzy Inference System (ANFIS) in order to compare results and choose the best method for producing that estimate. The selected method was then used in the development of a novel AI model for estimating labor productivity. The developed AI model is based on Radial Basis Function Neural Network (RBFNN) after enhancing it by raw dataset preprocessing and Particle Swarm Optimization (PSO) to extract significant dataset features for better generalization. The model, named PSO-RBFNN, was selected over other techniques based on its statistical performance and was used to estimate the baseline productivity values used as the initial value in the developed system dynamics (SD) model. In the fourth stage, a novel SD model was developed to examine the impact of change orders and different managerial decisions in response to imposed change orders on the expected productivity during the lifecycle of a project. In other words, the SD model is used to quantify the impact of change orders and related managerial decisions on excepted productivity. The SD model boundary was defined by clustering key variables into three categories: exogenous, endogenous, and excluded. The relationships among these key variables were extracted from the literature and experts in this domain. A holistic causal loop diagram was then developed to illustrate the interaction among various variables. In the final stage, the developed computational framework and its models were verified and validated through a real case study and the results show that the developed SD model addresses various consequences derived from a change in combination with the major environmental and operational variables of the project. It allows for the identification and quantification of the cumulative impact of change orders on labor productivity in a timely manner to facilitate the decision-making process. The developed framework can be used during the development and execution phases of a project. The findings are expected to enhance the assessment of change orders, facilitate the quantification of productivity losses in construction projects, and help to perform critical analysis of the impact of various scope change internal and external variables on project time and cost

    Enhancing Software Project Outcomes: Using Machine Learning and Open Source Data to Employ Software Project Performance Determinants

    Get PDF
    Many factors can influence the ongoing management and execution of technology projects. Some of these elements are known a priori during the project planning phase. Others require real-time data gathering and analysis throughout the lifetime of a project. These real-time project data elements are often neglected, misclassified, or otherwise misinterpreted during the project execution phase resulting in increased risk of delays, quality issues, and missed business opportunities. The overarching motivation for this research endeavor is to offer reliable improvements in software technology management and delivery. The primary purpose is to discover and analyze the impact, role, and level of influence of various project related data on the ongoing management of technology projects. The study leverages open source data regarding software performance attributes. The goal is to temper the subjectivity currently used by project managers (PMs) with quantifiable measures when assessing project execution progress. Modern-day PMs who manage software development projects are charged with an arduous task. Often, they obtain their inputs from technical leads who tend to be significantly more technical. When assessing software projects, PMs perform their role subject to the limitations of their capabilities and competencies. PMs are required to contend with the stresses of the business environment, the policies, and procedures dictated by their organizations, and resource constraints. The second purpose of this research study is to propose methods by which conventional project assessment processes can be enhanced using quantitative methods that utilize real-time project execution data. Transferability of academic research to industry application is specifically addressed vis-Ă -vis a delivery framework to provide meaningful data to industry practitioners
    • …
    corecore