6 research outputs found

    Predictiveness and Effectiveness of Story Points in Agile Software Development

    Get PDF
    Agile Software Development (ASD) is one of the most popular iterative software development methodologies, which takes a different approach from the conventional sequential methods. Agile methods promise a faster response to unanticipated changes during development, typically contrasted with traditional project development, which assumes that software is specifiable and predictable. Traditionally, practitioners and researchers have utilised different Functional Size Measures (FSMs) as the main cost driver to estimate the effort required to develop a project (Software Effort Estimation – SSE). However, FSM methods are not easy to use with ASD. Thus, another measure, namely Story Point (SP), has become popular in this context. SP is a relative unit representing an intuitive mixture of complexity and the required effort of a user requirement. Although recent surveys report on a growing trend toward intelligent effort estimation techniques for ASD, the adoption of these techniques is still limited in practice. Several factors limit the accuracy and adaptability of these techniques. The primary factor is the lack of enough noise-free information at the estimation time, restricting the model’s accuracy and reliability. This thesis concentrates on SEE for ASD from both the technique and data perspectives. Under this umbrella, I first evaluate two prominent state-of-the-art works for SP estimation to understand their strengths and weaknesses. I then introduce and evaluate a novel method for SP estimation based on text clustering. Next, I investigate the relationship between SP and development time by conducting a thorough empirical study. Finally, I explore the effectiveness of SP estimation methods when used to estimate the actual time. To carry out this research, I have curated the TAWOS (Tawosi Agile Web-based Open-Source) dataset, which consists of over half a million issues from Agile, open-source projects. TAWOS has been made publicly available to allow for reproduction and extension in future work

    Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Replication Study

    Get PDF
    In the last decade, several studies have explored automated techniques to estimate the effort of agile software development. We perform a close replication and extension of a seminal work proposing the use of Deep Learning for Agile Effort Estimation (namely Deep-SE), which has set the state-of-the-art since. Specifically, we replicate three of the original research questions aiming at investigating the effectiveness of Deep-SE for both within-project and cross-project effort estimation. We benchmark Deep-SE against three baselines (i.e., Random, Mean and Median effort estimators) and a previously proposed method to estimate agile software project development effort (dubbed TF/IDF-SVM), as done in the original study. To this end, we use the data from the original study and an additional dataset of 31,960 issues mined from TAWOS, as using more data allows us to strengthen the confidence in the results, and to further mitigate external validity threats. The results of our replication show that Deep-SE outperforms the Median baseline estimator and TF/IDF-SVM in only very few cases with statistical significance (8/42 and 9/32 cases, respectively), thus confounding previous findings on the efficacy of Deep-SE. The two additional RQs revealed that neither augmenting the training set nor pre-training Deep-SE play lead to an improvement of its accuracy and convergence speed. These results suggest that using semantic similarity is not enough to differentiate user stories with respect to their story points; thus, future work has yet to explore and find new techniques and features that obtain accurate agile software development estimates

    On the Relationship Between Story Point and Development Effort in Agile Open-Source Software

    Get PDF
    Background: Previous work has provided some initial evidence that Story Point (SP) estimated by human-experts may not accurately reflect the effort needed to realise Agile software projects. / Aims: In this paper, we aim to shed further light on the relationship between SP and Agile software development effort to understand the extent to which human-estimated SP is a good indicator of user story development effort expressed in terms of time needed to realise it. / Method: To this end, we carry out a thorough empirical study involving a total of 37,440 unique user stories from 37 different open-source projects publicly available in the TAWOS dataset. For these user stories, we investigate the correlation between the issue development time (or its approximation when the actual time is not available) and the SP estimated by human-expert by using three widely-used correlation statistics (i.e., Pearson, Kendall and Spearman). Furthermore, we investigate SP estimations made by the human-experts in order to assess the extent to which they are consistent in their estimations throughout the project, i.e., we assess whether the development time of the issues is proportionate to the SP assigned to them. / Results: The average results across the three correlation measures reveal that the correlation between the human-expert estimated SP and the approximated development time is strong for only 7% of the projects investigated, and medium (58%) or low (35%) for the remaining ones. Similar results are obtained when the actual development time is considered. Our empirical study also reveals that the estimation made is often not consistent throughout the project and the human estimator tends to misestimate in 78% of the cases. / Conclusions: Our empirical results suggest that SP might not be an accurate indicator of open-source Agile software development effort expressed in terms of development time. The impact of its use as an indicator of effort should be explored in future work, for example as a cost-driver in automated effort estimation models or as the prediction target

    Investigating the Effectiveness of Clustering for Story Point Estimation

    Get PDF
    Automated techniques to estimate Story Points (SP) for user stories in agile software development came to the fore a decade ago. Yet, the state-of-the-art estimation techniques’ accuracy has room for improvement. In this paper, we present a new approach for SP estimation, based on analysing textual features of software issues by employing latent Dirichlet allocation (LDA) and clustering. We first use LDA to represent issue reports in a new space of generated topics. We then use hierarchical clustering to agglomerate issues into clusters based on their topic similarities. Next, we build estimation models using the issues in each cluster. Then, we find the closest cluster to the new coming issue and use the model from that cluster to estimate the SP. Our approach is evaluated on a dataset of 26 open source projects with a total of 31,960 issues and compared against both baselines and state-of-the-art SP estimation techniques. The results show that the estimation performance of our proposed approach is as good as the state-of-the-art. However, none of these approaches is statistically significantly better than more naive estimators in all cases, which does not justify their additional complexity. We therefore encourage future work to develop alternative strategies for story points estimation. The experimental data and scripts we used in this work are publicly available to allow for replication and extension

    A Versatile Dataset of Agile Open Source Software Projects

    Get PDF
    Agile software development is nowadays a widely adopted practise in both open-source and industrial software projects. Agile teams typically heavily rely on issue management tools to document new issues and keep track of outstanding ones, in addition to storing their technical details, effort estimates, assignment to developers, and more. Previous work utilised the historical information stored in issue management systems for various purposes; however, when researchers make their empirical data public, it is usually relevant solely to the study’s objective. In this paper, we present a more holistic and versatile dataset containing a wealth of information on more than half a million issues from 44 open-source Agile software, making it well-suited to several research avenues, and cross-analyses therein, including effort estimation, issue prioritisation, issue assignment and many more. We make this data publicly available on GitHub to facilitate ease of use, maintenance, and extensibility
    corecore