39 research outputs found

    Meta-data to enhance case-based prediction.

    Get PDF
    The focus of this thesis is to measure the regularity of case bases used in Case-Based Prediction (CBP) systems and the reliability of their constituent cases prior to the system's deployment to influence user confidence on the delivered solutions. The reliability information, referred to as meta-data, is then used to enhance prediction accuracy. CBP is a strain of Case-Based Reasoning (CBR) that differs from the latter only in the solution feature which is a continuous value. Several factors make implementing such systems for prediction domains a challenge. Typically, the problem and solution spaces are unbounded in prediction problems that make it difficult to determine the portions of the domain represented by the case base. In addition, such problem domains often exhibit complex and poorly understood interactions between features and contain noise. As a result, the overall regularity in the case base is distorted which poses a hindrance to delivery of good quality solutions. Hence in this research, techniques have been presented that address the issue of irregularity in case bases with an objective to increase prediction accuracy of solutions. Although, several techniques have been proposed in the CBR literature to deal with irregular case bases, they are inapplicable to CBP problems. As an alternative, this research proposes the generation of relevant case-specific meta-data. The meta-data is made use of in Mantel's randomisation test to objectively measure regularity in the case base. Several novel visualisations using the meta-data have been presented to observe the degree of regularity and help identify suspect unreliable cases whose reuse may very likely yield poor solutions. Further, performances of individual cases are recorded to judge their reliability, which is reflected upon before selecting them for reuse along with their distance from the problem case. The intention is to overlook unreliable cases in favour of relatively distant yet more reliable ones for reuse to enhance prediction accuracy. The proposed techniques have been demonstrated on software engineering data sets where the aim is to predict the duration of a software project on the basis of past completed projects recorded in the case base. Software engineering is a human-centric, volatile and dynamic discipline where many unrecorded factors influence productivity. This degrades the regularity in case bases where cases are disproportionably spread out in the problem and solution spaces resulting in erratic prediction quality. Results from administering the proposed techniques were helpful to gain insight into the three software engineering data sets used in this analysis. The Mantel's test was very effective at measuring overall regularity within a case base, while the visualisations were learnt to be variably valuable depending upon the size of the data set. Most importantly, the proposed case discrimination system, that intended to reuse only reliable similar cases, was successful at increasing prediction accuracy for all three data sets. Thus, the contributions of this research are some novel approaches making use of meta-data to firstly provide the means to assess and visualise irregularities in case bases and cases from prediction domains and secondly, provide a method to identify unreliable cases to avoid their reuse in favour to more reliable cases to enhance overall prediction accuracy

    Meta-data to enhance case-based prediction

    Get PDF
    The focus of this thesis is to measure the regularity of case bases used in Case-Based Prediction (CBP) systems and the reliability of their constituent cases prior to the system's deployment to influence user confidence on the delivered solutions. The reliability information, referred to as meta-data, is then used to enhance prediction accuracy. CBP is a strain of Case-Based Reasoning (CBR) that differs from the latter only in the solution feature which is a continuous value. Several factors make implementing such systems for prediction domains a challenge. Typically, the problem and solution spaces are unbounded in prediction problems that make it difficult to determine the portions of the domain represented by the case base. In addition, such problem domains often exhibit complex and poorly understood interactions between features and contain noise. As a result, the overall regularity in the case base is distorted which poses a hindrance to delivery of good quality solutions. Hence in this research, techniques have been presented that address the issue of irregularity in case bases with an objective to increase prediction accuracy of solutions. Although, several techniques have been proposed in the CBR literature to deal with irregular case bases, they are inapplicable to CBP problems. As an alternative, this research proposes the generation of relevant case-specific meta-data. The meta-data is made use of in Mantel's randomisation test to objectively measure regularity in the case base. Several novel visualisations using the meta-data have been presented to observe the degree of regularity and help identify suspect unreliable cases whose reuse may very likely yield poor solutions. Further, performances of individual cases are recorded to judge their reliability, which is reflected upon before selecting them for reuse along with their distance from the problem case. The intention is to overlook unreliable cases in favour of relatively distant yet more reliable ones for reuse to enhance prediction accuracy. The proposed techniques have been demonstrated on software engineering data sets where the aim is to predict the duration of a software project on the basis of past completed projects recorded in the case base. Software engineering is a human-centric, volatile and dynamic discipline where many unrecorded factors influence productivity. This degrades the regularity in case bases where cases are disproportionably spread out in the problem and solution spaces resulting in erratic prediction quality. Results from administering the proposed techniques were helpful to gain insight into the three software engineering data sets used in this analysis. The Mantel's test was very effective at measuring overall regularity within a case base, while the visualisations were learnt to be variably valuable depending upon the size of the data set. Most importantly, the proposed case discrimination system, that intended to reuse only reliable similar cases, was successful at increasing prediction accuracy for all three data sets. Thus, the contributions of this research are some novel approaches making use of meta-data to firstly provide the means to assess and visualise irregularities in case bases and cases from prediction domains and secondly, provide a method to identify unreliable cases to avoid their reuse in favour to more reliable cases to enhance overall prediction accuracy.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Abstract How Long will it Take to Fix This Bug?

    No full text
    Predicting the time and effort for a software problem has long been a difficult task. We present an approach that automatically predicts the fixing effort, i.e., the person-hours spent on fixing an issue. Our technique leverages existing issue tracking systems: given a new issue report, we use the Lucene framework to search for similar, earlier reports and use their average time as a prediction. Our approach thus allows for early effort estimation, helping in assigning issues and scheduling stable releases. We evaluated our approach using effort data from the JBoss project. Given a sufficient number of issues reports, our automatic predictions are close to the actual effort; for issues that are bugs, we are off by only one hour, beating naïve predictions by a factor of four. 1

    Building Software Cost Estimation Models using Homogenous Data

    No full text
    Several studies have been conducted to determine if company-specific cost models deliver better prediction accuracy than cross-company cost models. However, mixed results have left the question still open for further investigation. We suspect this to be a consequence of heterogenous data used to build cross-company cost models. In this paper, we build cross-company cost models using homogenous data by grouping projects by their business sector. Our results suggest that it is worth to train models using only homogenous data rather than all projects available

    Meta-data to Guide Retrieval in CBR for Software Cost Prediction

    No full text
    Abstract. In recent years, case-based prediction has become a widely advocated technique for software cost estimation. Typically, such approaches are in essence k nearest neighbour methods supported by a case base of a feature vector per software project. Given that software project data is relatively rare – data sets may contain as few as 20 cases – it is common to find a relatively undiscriminating approach to the projects contained within the case bases. We hypothesise that meta-data generated by monitoring case performance can contribute to identifying misleading cases and improve predictions. This paper reports results of a pilot study in which we enriched our case base with meta-data to record performance behaviour of individual data sets. An external fuzzy model was used to classify individual cases as fit or unfit for future use. Misleading cases i.e. with poor predictive ability were seeded into the data set to assess the potential of the approach. Our results show that the model successfully identified the seed cases and refrained from using them during future retrievals

    Towards the next generation of bug tracking systems

    No full text
    Developers typically rely on the information submitted by end-users to resolve bugs. We conducted a survey on information needs and commonly faced problems with bug reporting among several hundred developers and users of the APACHE, ECLIPSE and MOZILLA projects. In this paper, we present the results of a card sort on the 175 comments sent back to us by the responders of the survey. The card sort revealed several hurdles involved in reporting and resolving bugs, which we present in a collection of recommendations for the design of new bug tracking systems. Such systems could provide contextual assistance, reminders to add information, and most important, assistance to collect and report crucial information to developers. 1
    corecore