5 research outputs found

    Large-Scale Identification and Analysis of Factors Impacting Simple Bug Resolution Times in Open Source Software Repositories

    Get PDF
    One of the most prominent issues the ever-growing open-source software community faces is the abundance of buggy code. Well-established version control systems and repository hosting services such as GitHub and Maven provide a checks-and-balances structure to minimize the amount of buggy code introduced. Although these platforms are effective in mitigating the problem, it still remains. To further the efforts toward a more effective and quicker response to bugs, we must understand the factors that affect the time it takes to fix one. We apply a custom traversal algorithm to commits made for open source repositories to determine when “simple stupid bugs” were first introduced to projects and explore the factors that drive the time it takes to fix them. Using the commit history from the main development branch, we are able to identify the commit that first introduced 13 different types of simple stupid bugs in 617 of the top Java projects on GitHub. Leveraging a statistical survival model and other non-parametric statistical tests, we found that there were two main categories of categorical variables that affect a bug’s life; Time Factors and Author Factors. We find that bugs are fixed quicker if they are introduced and resolved by the same developer. Further, we discuss how the day of the week and time of day a buggy code was written and fixed affects its resolution time. These findings will provide vital insight to help the open-source community mitigate the abundance of code and can be used in future research to aid in bug-finding programs

    Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review

    Full text link
    Context The International Software Benchmarking Standards Group (ISBSG) maintains a software development repository with over 6000 software projects. This dataset makes it possible to estimate a project s size, effort, duration, and cost. Objective The aim of this study was to determine how and to what extent, ISBSG has been used by researchers from 2000, when the first papers were published, until June of 2012. Method A systematic mapping review was used as the research method, which was applied to over 129 papers obtained after the filtering process. Results The papers were published in 19 journals and 40 conferences. Thirty-five percent of the papers published between years 2000 and 2011 have received at least one citation in journals and only five papers have received six or more citations. Effort variable is the focus of 70.5% of the papers, 22.5% center their research in a variable different from effort and 7% do not consider any target variable. Additionally, in as many as 70.5% of papers, effort estimation is the research topic, followed by dataset properties (36.4%). The more frequent methods are Regression (61.2%), Machine Learning (35.7%), and Estimation by Analogy (22.5%). ISBSG is used as the only support in 55% of the papers while the remaining papers use complementary datasets. The ISBSG release 10 is used most frequently with 32 references. Finally, some benefits and drawbacks of the usage of ISBSG have been highlighted. Conclusion This work presents a snapshot of the existing usage of ISBSG in software development research. ISBSG offers a wealth of information regarding practices from a wide range of organizations, applications, and development types, which constitutes its main potential. However, a data preparation process is required before any analysis. Lastly, the potential of ISBSG to develop new research is also outlined.Fernández Diego, M.; González-Ladrón-De-Guevara, F. (2014). Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review. Information and Software Technology. 56(6):527-544. doi:10.1016/j.infsof.2014.01.003S52754456

    Software prozesuen hobekuntzarako ekimenen biziraupen-analisia eta sailkapen-ikasketa, eta horien ondorioak enpresa txikietan

    Get PDF
    116 p.Softwareak funtsezko papera dauka negozio gehienetan. Hain zuzen ere, edozein negozioren abantaila lehiakorraren gako nagusietako bat dela esan daiteke. Software hori enpresa handi, ertain edo txikiek sor dezakete. Testuinguru horretan, erakunde mota horiek prozesuak hobetzeko ekimenak martxan jartzeko hautua egiten dute, merkatuan eskaintzen dituzten zerbitzuen edo azken produktuen kalitatea hobetzeko helburuarekin. Hortaz, ohikoa izaten da enpresa handi eta ertainek azken produktuen garapen-prozesuak zehaztea, are eredugarriak diren kalitate-ereduak erabiltzea, industriatik eratorritako jardunbide egokiekin. Izan ere, hobekuntza-ekimen bat aurrera eramaten laguntzeko erreferentziazko eredu eta estandar asko daude. Hortaz, erakundeek hainbat eredutako eskakizunak bete behar izaten dituzte aldi berean. Estandar horien barruan antzekoak diren praktika edo eskakizunak egon ohi dira (bikoiztasunak), edo neurri handiko erakundeentzat pentsatuta daudenak. Erakunde txikien esparruan, bikoiztasun horiek gainkostua eragiten dute ekimen hauetan. Horren ondorioz, erreferentziazko ereduekin loturiko prozesuak zehazteko orduan, burokrazia-lana handitu egiten da. Horrez gain, eredu hauen bikoiztasunak ezabatzera eta bere prozesuak hainbat arau aldi berean aintzat hartuta berraztertzera behartzen ditu. Egoera hori bereziki delikatua da 25 langiletik behera dituzten erakunde txikientzat, Very Small Entities (VSE) izenez ere ezagunak direnak. Erakunde mota hauek ahal duten modurik onenean erabiltzen dituzte haien baliabideak, eta, haien ikuspegitik, erreferentziazko eredu hauek gastu bat dira inbertsio bat baino gehiago. Hortaz, ez dute prozesuak hobetzeko ekimenik martxan jartzen. Ildo horretatik, erakunde horiei VSE-en beharretara egokituko zen eredu bat eskaintzeko sortu zen ISO/IEC 29110.ISO/IEC 29110 arauaren lehen edizioa 2011n sortu zen eta, ordutik, zenbait ikerketa-lan eta industria-esperientzia garatu dira testuinguru horren barruan. Batetik, ez dago VSE-ekin loturik dauden nahikoa industria-esperientzia, eta, beraz, ez da erraza jakitea zein den VSE-en portaera. 2011tik, ISO/IEC29110 arauarekin zerikusia duten hainbat lan argitaratu dira, baina, orain arte, lan horien tipologia oso desberdina izan da. Horrenbestez, ezinbestekoa da lehen esperientzia hauek aztertu eta ezagutzea, egindako lehen lan horiek sailkatu ahal izateko. Bestetik, prozesuak hobetzeko ekimenek ez dute beti arrakastarik izaten, eta mota honetako ekimen baten iraupena zein izango den ere ez da gauza ziurra izaten. Hartara, ekimen hauek testuinguru hauetan daukaten biziraupen maila zein den aztertu behar da, bai eta VSE-etan prozesuak hobetzeko ekimenak garatu eta ezarri bitartean eman daitezkeen lan-ereduak identifikatzea ere. Azkenik, garatzen dituzten produktuen segurtasun-arloarekin kezka berezia izan ohi dute VSEk. Hortaz, segurtasun-alderdi nagusiak kudeatzeko mekanismoak ezarri behar izaten dituzte.Lehenik eta behin, lan honetan, ISO/IEC 29110 arauarekin loturiko artikuluen azterketa metodiko bat egin dugu, eta ikerketa-esparru nagusiak eta egindako lan mota garrantzitsuenak jaso ditugu. Bigarrenik, VSEk prozesuak hobetzeko martxan jarritako mota honetako ekimenen biziraupena aztertzeko marko bat proposatu dugu. Hirugarrenik, haien portaeraren ezaugarriak zehazteko, ekimen hauetan ematen diren ereduak identifikatzeko ikuspegia landu dugu. Laugarrenik, VSEn softwarearen garapenaren bizi-zikloan segurtasun-arloko alderdiak gehitzeko eta zor teknikoa kudeatzeko proposamena egin dugu

    Carryover parts and new product reliability

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, 2009.Cataloged from PDF version of thesis.Includes bibliographical references.By studying a unique data set from a motor vehicle manufacturer, we find that carryover parts, common parts used in successive generations of multi-generational products, are a major source of quality problems, contrary to conventional wisdom. Moreover, the failure rate of carryover parts grows from one generation to the next, a phenomenon known as the carryover spike. Motivated by these results and the need to understand the quality dynamics of multi-generational products, we empirically analyze the field problem-solving process and the new product introduction spike. We attempt to answer the following questions: what factors influence the time required to solve problems? Furthermore, what factors influence the cancellation probability of problem-solving projects? In addition to these questions related to the field problem-solving process, we seek to understand the factors that influence the new product introduction spike. We also investigate various ways to offset the failures of carryover parts. Using a novel simulation model, we test different policies that aim for better prioritization and analysis of carryover problems. Simulation results show that product reliability can be improved drastically using these policies. Our results indicate that managers should expect to witness higher warranty costs related to carryover parts on new products, due to trends in the industry.by Gokhan Dogan.Ph.D

    Data cleaning techniques for software engineering data sets

    Get PDF
    Data quality is an important issue which has been addressed and recognised in research communities such as data warehousing, data mining and information systems. It has been agreed that poor data quality will impact the quality of results of analyses and that it will therefore impact on decisions made on the basis of these results. Empirical software engineering has neglected the issue of data quality to some extent. This fact poses the question of how researchers in empirical software engineering can trust their results without addressing the quality of the analysed data. One widely accepted definition for data quality describes it as `fitness for purpose', and the issue of poor data quality can be addressed by either introducing preventative measures or by applying means to cope with data quality issues. The research presented in this thesis addresses the latter with the special focus on noise handling. Three noise handling techniques, which utilise decision trees, are proposed for application to software engineering data sets. Each technique represents a noise handling approach: robust filtering, where training and test sets are the same; predictive filtering, where training and test sets are different; and filtering and polish, where noisy instances are corrected. The techniques were first evaluated in two different investigations by applying them to a large real world software engineering data set. In the first investigation the techniques' ability to improve predictive accuracy in differing noise levels was tested. All three techniques improved predictive accuracy in comparison to the do-nothing approach. The filtering and polish was the most successful technique in improving predictive accuracy. The second investigation utilising the large real world software engineering data set tested the techniques' ability to identify instances with implausible values. These instances were flagged for the purpose of evaluation before applying the three techniques. Robust filtering and predictive filtering decreased the number of instances with implausible values, but substantially decreased the size of the data set too. The filtering and polish technique actually increased the number of implausible values, but it did not reduce the size of the data set. Since the data set contained historical software project data, it was not possible to know the real extent of noise detected. This led to the production of simulated software engineering data sets, which were modelled on the real data set used in the previous evaluations to ensure domain specific characteristics. These simulated versions of the data set were then injected with noise, such that the real extent of the noise was known. After the noise injection the three noise handling techniques were applied to allow evaluation. This procedure of simulating software engineering data sets combined the incorporation of domain specific characteristics of the real world with the control over the simulated data. This is seen as a special strength of this evaluation approach. The results of the evaluation of the simulation showed that none of the techniques performed well. Robust filtering and filtering and polish performed very poorly, and based on the results of this evaluation they would not be recommended for the task of noise reduction. The predictive filtering technique was the best performing technique in this evaluation, but it did not perform significantly well either. An exhaustive systematic literature review has been carried out investigating to what extent the empirical software engineering community has considered data quality. The findings showed that the issue of data quality has been largely neglected by the empirical software engineering community. The work in this thesis highlights an important gap in empirical software engineering. It provided clarification and distinctions of the terms noise and outliers. Noise and outliers are overlapping, but they are fundamentally different. Since noise and outliers are often treated the same in noise handling techniques, a clarification of the two terms was necessary. To investigate the capabilities of noise handling techniques a single investigation was deemed as insufficient. The reasons for this are that the distinction between noise and outliers is not trivial, and that the investigated noise cleaning techniques are derived from traditional noise handling techniques where noise and outliers are combined. Therefore three investigations were undertaken to assess the effectiveness of the three presented noise handling techniques. Each investigation should be seen as a part of a multi-pronged approach. This thesis also highlights possible shortcomings of current automated noise handling techniques. The poor performance of the three techniques led to the conclusion that noise handling should be integrated into a data cleaning process where the input of domain knowledge and the replicability of the data cleaning process are ensured.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore