142 research outputs found

    Data-Driven Application Maintenance: Views from the Trenches

    Full text link
    In this paper we present our experience during design, development, and pilot deployments of a data-driven machine learning based application maintenance solution. We implemented a proof of concept to address a spectrum of interrelated problems encountered in application maintenance projects including duplicate incident ticket identification, assignee recommendation, theme mining, and mapping of incidents to business processes. In the context of IT services, these problems are frequently encountered, yet there is a gap in bringing automation and optimization. Despite long-standing research around mining and analysis of software repositories, such research outputs are not adopted well in practice due to the constraints these solutions impose on the users. We discuss need for designing pragmatic solutions with low barriers to adoption and addressing right level of complexity of problems with respect to underlying business constraints and nature of data.Comment: Earlier version of paper appearing in proceedings of the 4th International Workshop on Software Engineering Research and Industrial Practice (SER&IP), IEEE Press, pp. 48-54, 201

    In Pursuit of Optimal Workflow Within The Apache Software Foundation

    Get PDF
    abstract: The following is a case study composed of three workflow investigations at the open source software development (OSSD) based Apache Software Foundation (Apache). I start with an examination of the workload inequality within the Apache, particularly with regard to requirements writing. I established that the stronger a participant's experience indicators are, the more likely they are to propose a requirement that is not a defect and the more likely the requirement is eventually implemented. Requirements at Apache are divided into work tickets (tickets). In our second investigation, I reported many insights into the distribution patterns of these tickets. The participants that create the tickets often had the best track records for determining who should participate in that ticket. Tickets that were at one point volunteered for (self-assigned) had a lower incident of neglect but in some cases were also associated with severe delay. When a participant claims a ticket but postpones the work involved, these tickets exist without a solution for five to ten times as long, depending on the circumstances. I make recommendations that may reduce the incidence of tickets that are claimed but not implemented in a timely manner. After giving an in-depth explanation of how I obtained this data set through web crawlers, I describe the pattern mining platform I developed to make my data mining efforts highly scalable and repeatable. Lastly, I used process mining techniques to show that workflow patterns vary greatly within teams at Apache. I investigated a variety of process choices and how they might be influencing the outcomes of OSSD projects. I report a moderately negative association between how often a team updates the specifics of a requirement and how often requirements are completed. I also verified that the prevalence of volunteerism indicators is positively associated with work completion but what was surprising is that this correlation is stronger if I exclude the very large projects. I suggest the largest projects at Apache may benefit from some level of traditional delegation in addition to the phenomenon of volunteerism that OSSD is normally associated with.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201

    Improved method of searching the associative rules while developing the software

    Get PDF
    As the delivery of the good quality software in time is very important part of the software development process, it's very important task to organize this process very accurately. For this a new method of the searching associative rules where proposed. It is based on the classification of the all tasks on three different groups, depending on their difficulty, and after this, searching associative rules among them, which will help to define the time necessary to perform specific task by specific develope

    Recommending Issue Reports to Developers Using Machine Learning

    Get PDF
    TarkvarasĂŒsteemide arendust viiakse tihti lĂ€bi iteratiivse protsessina ning erinevad tĂ¶Ă¶ĂŒleasnded tekkivad siis kui leitakse defekte vĂ”i tekib vajadus uue funktsionaalsuse jĂ€rele. Need ĂŒlesanded salvestatakse probleemihalduse sĂŒsteemi, kust arendajad saavad sisendit oma tööle. Ülesannete jaotamine arendajatele vĂ”ib toimude mitmel eri viisil. Üks populaarsemaid lĂ€henemisi nĂ€eb ette, et arendajad valivad ise ĂŒlesandeid, mis neid huvitavad. Suurtes projektides vĂ”ib see aga muutuda keeruliseks: ĂŒlesannete suure arvu tĂ”ttu on arendajatel raske aegsasti valida omale huvitav tĂ¶Ă¶ĂŒlesanne. Selle probleemi leevendamiseks esitatakse antud töös masinĂ”ppel pĂ”hinev soovitussĂŒsteem, mis on vĂ”imeline probleemihalduse sĂŒsteemi ajaloost Ă”ppima milliseid ĂŒlesandeid on iga arendaja eelnevalt tĂ€itnud ja selle pĂ”hjal soovitada neile uusi ĂŒlesandeid. SĂŒsteemi arendamiseks koguti 6 erinevast avatud lĂ€htekoodiga projektist ĂŒlesandeid, kasutati erinevaid masinĂ”ppe meetodeid ja vĂ”rreldi tulemusi, et leida sobivaim. SoovitussĂŒsteemi jĂ”udluse hindamiseks kasutati tĂ€psuse (precision), saagise (recall), f1-skoori (f1-score) ja keskmise tĂ€psuse (mean average precision) mÔÔdikuid. Tulemused nĂ€itavad, et 100 tĂ¶Ă¶ĂŒlesande kirjelduse pĂ”hjal 10 igale arendajale sobivaima soovitamise puhul vĂ”ib saavutada saagise 52.9% ja 96% vahel, mis on 6 kuni 9.5 korda parem 10 juhusliku töökirjelduse valimisest. Sarnased parandused saavutati ka teistes mÔÔdikutes.The development of a software system is often done through an iterative process and different change requests arise when bugs and defects are detected or new features need to be added. These requirements are recorded as issue reports and put in the backlog of the software project for developers to work on. The assignment of these issue reports to developers is done in different ways. One common approach is self-assignment, where the developers themselves pick the issue reports they are interested in and assign themselves. Practising self-assignment in large projects can be challenging for developers because the backlog of large projects become loaded with many issue reports, which makes it hard for developers to filter out the issue reports in line with their interest. To tackle this problem, a machine learning-based recommender system is proposed in this thesis. This recommender system can learn from the history of the issue reports that each developer worked on previously and recommend new issue reports suited to each developer. To implement this recommender system, issue reports were collected from 6 different opensource projects and different machine learning techniques were applied and compared in order to determine the most suitable one. For evaluating the performance of the recommender system, the Precision, Recall, F1-score and Mean Average Precision metrics were used. The results show that, from a backlog of 100 issue reports, by recommending the top 10 issue reports to each developer a recall ranging from 52.9% up to 96% can be achieved, which is 6 up to 9.5 times better than picking 10 issue reports randomly. Comparable improvements were also achieved in the other metrics

    Profiling Developers Through the Lens of Technical Debt

    Full text link
    Context: Technical Debt needs to be managed to avoid disastrous consequences, and investigating developers' habits concerning technical debt management is invaluable information in software development. Objective: This study aims to characterize how developers manage technical debt based on the code smells they induce and the refactorings they apply. Method: We mined a publicly-available Technical Debt dataset for Git commit information, code smells, coding violations, and refactoring activities for each developer of a selected project. Results: By combining this information, we profile developers to recognize prolific coders, highlight activities that discriminate among developer roles (reviewer, lead, architect), and estimate coding maturity and technical debt tolerance

    Improving Bug Triaging Using Software Analytics

    Get PDF
    RÉSUMÉ La correction de bogues est une activitĂ© majeure pendant le dĂ©veloppement et maintenance de logiciels. Durant cette activitĂ©, le tri de bogues joue un rĂŽle essentiel. Il aide les gestionnaires à allouer leurs ressources limitĂ©es et permet aux dĂ©veloppeurs de concentrer leurs efforts plus efficacement sur les bogues à haute sĂ©vĂ©ritĂ©. Malheureusement, les techniques du tri de bogues appliquĂ©es dans beaucoup d’entreprises ne sont pas toujours efficaces et conduisent à la misclassifications de bogues ou à des retards dans leurs rĂ©solutions, qui peuvent mener à la dĂ©gradation de la qualitĂ© d’un logiciel et à la dĂ©ception de ses utilisateurs. Une stratĂ©gie de tri de bogues amĂ©liorĂ©e est nĂ©cessaire pour aider les gestionnaires à prendre de meilleures dĂ©cisions, par exemple en accordant des degrĂ©s de prioritĂ© et sĂ©vĂ©ritĂ© appropriĂ©s aux bogues, ce qui permet aux dĂ©veloppeurs de corriger les problĂšmes critiques le plus tĂŽt possible en ignorant les problĂšmes futiles. Dans ce mĂ©moire, nous utilisons les approches analytiques pour amĂ©liorer le tri de bogues. Nous rĂ©alisons trois Ă©tudes empiriques. La premiĂšre Ă©tude porte sur la relation entre les corrections de bogues qui ont besoin d’autres corrections ultĂ©rieures (corrections supplĂ©mentaires) et les bogues qui ont Ă©tĂ© ouverts plus d’une fois (bogues rĂ©-ouverts). Nous observons que les bogues rĂ©-ouverts occupent entre 21,6% et 33,8% de toutes les corrections supplĂ©mentaires. Un grand nombre de bogues rĂ©-ouverts (de 33,0% à 57,5%) n’ont qu’une correction prĂ©alable : les bogues originaux ont Ă©tĂ© fermĂ©s prĂ©maturĂ©ment. La deuxiĂšme Ă©tude concerne les bogues qui provoquent des plantages frĂ©quents, affectant de nombreux utilisateurs. Nous avons observĂ© que ces bogues ne reçoivent pas toujours une attention adĂ©quate mĂȘme s’ils peuvent sĂ©rieusement dĂ©grader la qualitĂ© d’un logiciel et mĂȘme la rĂ©putation de l’entreprise. Notre troisiĂšme Ă©tude concerne les commits qui conduisent à des plantages. Nous avons trouvĂ© que ces commits sont souvent validĂ©s par des dĂ©veloppeurs moins expĂ©rimentĂ©s et qu’ils contiennent plus d’additions et de suppressions de lignes de code que les autre commits. Si les entreprises de logiciels pourraient dĂ©tecter les problĂšmes susmentionnĂ©s pendant la phase du tri de bogues, elles pourraient augmenter l’efficacitĂ© de leur correction de bogues et la satisfaction de leurs utilisateurs, rĂ©duisant le coĂ»t de la maintenance de logiciels. En utilisant plusieurs algorithmes de rĂ©gression et d’apprentissage automatique, nous avons bĂąti des modĂšles statistiques permettant de prĂ©dire respectivement des bogues rĂ©-ouverts (avec une prĂ©cision atteignant 97,0% et un rappel atteignant 65,3%), des bogues affectant un grand nombre d’utilisateurs (avec une prĂ©cision atteignant 64,2% et un rappel atteignant 98.3%) et des commits induisant des plantages (avec une prĂ©cision atteignant 61,4% et un rappel atteignant 95,0%). Les entreprises de logiciels peuvent appliquer nos modĂšles afin d’amĂ©liorer leur stratĂ©gie de tri de bogues, Ă©viter les misclassifications de bogues et rĂ©duire la insatisfaction des utilisateurs due aux plantages.----------ABSTRACT Bug fixing has become a major activity in software development and maintenance. In this process, bug triaging plays an important role. It assists software managers in the allocation of their limited resources and allow developers to focus their efforts more efficiently to solve defects with high severity. Current bug triaging techniques applied in many software organisations may lead to misclassification of bugs, thus delay in bug resolution; resulting in degradation of software quality and users’ frustration. An improved bug triaging strategy would help software managers make better decisions by assigning the right priority and severity to bugs, allowing developers to address critical bugs as soon as possible and ignore the trivial ones. In this thesis, we leverage analytic approaches to conduct three empirical studies aimed at improving bug triaging techniques. The first study investigates the relation between bug fixes that need supplementary fixes and bugs that have been re-opened. We found that re-opened bugs account from 21.6% to 33.8% of all supplementary bug fixes. A considerable number of re-opened bugs (from 33.0% to 57.5%) had only one commit associated: their original bug reports were prematurely closed. The second study focuses on bugs that yield frequent crashes and impact large numbers of users. We found that these bugs were not prioritised by software managers albeit they can seriously decrease user-perceived quality and even the reputation of a software organisation. Our third study examines commits that lead to crashes. We found that these commits are often submitted by less experienced developers and that they contain more addition and deletion of lines of code than other commits. If software organisations can detect the aforementioned problems early on in the bug triaging phase, they can effectively increase their development productivity and users’ satisfaction, while decreasing software maintenance overhead. By using multiple regression and machine learning algorithms, we built statistical models to predict re-opened bugs among bugs that required supplementary bug fixes (with a precision up to 97.0% and a recall up to 65.3%), bugs with high crashing impact (with a precision up to 64.2% and a recall up to 98.3%), and commits inducing future crashes (with a precision up to 61.4% and a recall up to 95.0%). Software organisations can apply our proposed models to improve their bug triaging strategy by assigning bugs to the right developers, avoiding misclassification of bugs, reducing the negative impact of crash-related bugs, and addressing fault-prone code early on before they impact a large user base
    • 

    corecore