23 research outputs found

    Towards Identifying Paid Open Source Developers - A Case Study with Mozilla Developers

    Full text link
    Open source development contains contributions from both hired and volunteer software developers. Identification of this status is important when we consider the transferability of research results to the closed source software industry, as they include no volunteer developers. While many studies have taken the employment status of developers into account, this information is often gathered manually due to the lack of accurate automatic methods. In this paper, we present an initial step towards predicting paid and unpaid open source development using machine learning and compare our results with automatic techniques used in prior work. By relying on code source repository meta-data from Mozilla, and manually collected employment status, we built a dataset of the most active developers, both volunteer and hired by Mozilla. We define a set of metrics based on developers' usual commit time pattern and use different classification methods (logistic regression, classification tree, and random forest). The results show that our proposed method identify paid and unpaid commits with an AUC of 0.75 using random forest, which is higher than the AUC of 0.64 obtained with the best of the previously used automatic methods.Comment: International Conference on Mining Software Repositories (MSR) 201

    Individual differences limit predicting well-being and productivity using software repositories : a longitudinal industrial study

    Get PDF
    Reports of poor work well-being and fluctuating productivity in software engineering have been reported in both academic and popular sources. Understanding and predicting these issues through repository analysis might help manage software developers' well-being. Our objective is to link data from software repositories, that is commit activity, communication, expressed sentiments, and job events, with measures of well-being obtained with a daily experience sampling questionnaire. To achieve our objective, we studied a single software project team for eight months in the software industry. Additionally, we performed semi-structured interviews to explain our results. The acquired quantitative data are analyzed with generalized linear mixed-effects models with autocorrelation structure. We find that individual variance accounts for most of the R-2 values in models predicting developers' experienced well-being and productivity. In other words, using software repository variables to predict developers' well-being or productivity is challenging due to individual differences. Prediction models developed for each developer individually work better, with fixed effects R-2 value of up to 0.24. The semi-structured interviews give insights into the well-being of software developers and the benefits of chat interaction. Our study suggests that individualized prediction models are needed for well-being and productivity prediction in software development.Peer reviewe

    Adult readers evaluating the credibility of social media posts: Prior belief consistency and source's expertise matter most

    Full text link
    The present study investigates the role of source characteristics, the quality of evidence, and prior beliefs of the topic in adult readers' credibility evaluations of short health-related social media posts. The researchers designed content for the posts concerning five health topics by manipulating the source characteristics (source's expertise, gender, and ethnicity), the accuracy of the claims, and the quality of evidence (research evidence, testimony, consensus, and personal experience) of the posts. After this, accurate and inaccurate social media posts varying in the other manipulated aspects were programmatically generated. The crowdworkers (N = 844) recruited from two platforms were asked to evaluate the credibility of up to ten social media posts, resulting in 8380 evaluations. Before credibility evaluation, participants' prior beliefs on the topics of the posts were assessed. The results showed that prior belief consistency and the source's expertise affected the perceived credibility of the accurate and inaccurate social media posts the most after controlling for the topic of the post and the crowdworking platform. In contrast, the quality of evidence supporting the health claim mattered relatively little. The source's gender and ethnicity did not have any effect. The results are discussed in terms of first- and second-hand evaluation strategies.Comment: 16 pages, 4 figures including the appendix. Submitted to a journal for peer revie

    Time pressure and well-being in software engineering:evidence from software repositories, experience sampling, and prior literature

    No full text
    Abstract Popular and academic sources have indicated that high-pressure work environments are commonplace in the software industry, leading to stress and burnout. One cause of stress is time pressure, not having enough time to complete a task at hand. In addition to effects on well-being, time pressure affects software development processes, productivity, and quality. Synthesising prior evidence and providing real-time data to managers could help to minimize the detrimental effects and optimize productivity. This thesis aims to investigate and give a comprehensive view of the existing body of knowledge on the effects of time pressure in software engineering, including processes, methods, and individual developers. Additionally, we aim to investigate ways to link time pressure and work well-being to software repositories to understand the well-being of software developers better. The research consists of two branches: a review branch and a primary study branch. In the review branch, prior knowledge related to sentiment analysis and time pressure was analyzed with bibliometric studies, making a systematic map and a systematic literature review. Studies were conducted using software repository mining, sentiment analysis, experience sampling, and interviews in the primary study branch. Results from the review branch indicate, among others, increased productivity and decreased quality under time pressure. The causes of time pressure can be divided into technical and social factors, with errors in cost estimation, project management, and company culture being the most common causes. The results from the primary study branch show the limiting effect of individual differences on the prediction of well-being. Other findings include the detection of work rhythms through mining time stamps of code commits and the prediction ability of chat activity over chat sentiment on developer productivity. While the research for this thesis could not find clear links between repository variables and developer well-being that would work at a team level, possibilities to study these links further are established. Future work related to time pressure in software engineering should focus on contextual factors such as company culture and trade-offs between productivity, quality, and well-being within different time scales.Tiivistelmä Ammattilais- ja akateeminen kirjallisuus on viitannut painostavien työympäristöjen olevan yleisiä ohjelmistoalalla, johtaen ylimääräiseen stressiin ja työuupumukseen. Yksi stressin lähde on aikapaine, ts. tehtävän tekemiseen ei ole tarpeeksi aikaa. Heikentyneen työhyvinvoinnin lisäksi aikapaine vaikuttaa tuottavuuteen ja ohjelmistojen laatuun. Aikaisempien tutkimustulosten syntetisointi ja reaaliaikaisen tiedon tuottaminen managereille voisi helpottaa aikapaineen haitallisia vaikutuksia ja parantaa tehokkuutta. Tämä väitöskirja yrittää antaa kokonaisvaltaisemman kuvan olemassa olevasta aikapaineeseen liittyvästä kirjallisuudesta ohjelmistokehityksen kontekstissa, mukaanlukien vaikutuksista prosesseihin, metodeihin ja ohjelmistokehittäjiin. Lisäksi tavoitteena on myös yrittää yhdistää aikapaine ja työhyvinvointi ohjelmistokehityksen työkaluista saatavaan tietoon. Tehty tutkimus koostuu kahdesta osiosta: kirjallisuuskatsaukset ja primääriset tutkimukset. Kirjallisuuskatsauksiin keskittyvässä osiossa käytettiin muunmuassa klusteriointia laajojen aineistojen katselmoimiseen liittyen sentimentti analyysiin ja aikapaineeseen. Lisäksi tehtiin systemaattinen kartta ja -katsaus aikapaineeseen ohjelmistokehityksen kontekstissa. Primääritutkimuksissa käytettiin tutkimusmetodologioina ohjelmistokehitykseen liittyvien tietolähteiden “louhintaa”, sentimentti analyysiä, ESM-menetelmää ja haastatteluja. Kirjallisuuskatsausosion tulokset näyttävät aikapaineen lisäävän tuottavuutta ja huonontavan laatua ohjelmistokehityksessä. Aikapaineen aiheuttajat ovat teknisiä ja sosiaalisia ja ne liittyvät kolmeen kategoriaan: virheet kustannusarvioissa, virheet projektijohtamisessa ja yrityksen kulttuuri. Primääritutkimusosion tulokset näyttävät, kuinka erot ohjelmistokehittäjien välillä vaikeuttavat hyvinvoinnin ennustamista ohjelmistokehitykseen liittyvistä työkaluista saadusta tiedosta. Muita tuloksia ovat se, että kommitoidun ohjelmakoodin määrä seuraa vuorokausirytmiä avoimenlähdekoodin projekteissa, sekä se, että yksittäisessä ohjelmistoprojektissa kommunikaation määrä ennusti kommitoidun lähdekoodin määrää paremmin kuin kommunikaatiossa oleva sentimentti. Vaikka tämä väitöstutkimus ei pystynyt löytämään ohjelmistokehitystyökaluista saatavien muuttujien ja ohjelmistokehittäjien hyvinvoinnin välille selviä linkkejä, jotka toimisivat hyvinä ennustajina ohjelmistokehitystiimin tasolla, osoittaa tutkimus lisää mahdollisuuksia tutkia näitä linkkejä. Tulevan aikapaineeseen liittyvän tutkimuksen ohjelmistokehityksen saralla tulisi keskittyä kontekstisidonnaisiin muuttujiin, kuten yrityskulttuuriin, sekä valintoihin tuottavuuden, laadun ja hyvinvoinnin välillä eri aikajänteillä

    Benchmarking configurations for web-testing:Selenium versus Watir

    No full text
    Benefits of testing automation according to current literature are reusability, repeatability and effort saved in test execution, while some of the current difficulties lie in maintainability, initial investment and test case creation. This thesis presents a brief literature review on the state of testing automation and a larger literature review on the use of Selenium and Watir in web context. Literature review on the use of Selenium and Watir contains an introduction to the history and use of the tools, as well as a look on the academic literature and blogosphere on the subject. The aim of this research is to identify differences on performance of configurations used by open source testing tools Selenium and Watir in web context. This thesis presents a quantitative controlled experiment measuring and comparing execution times, memory use and lines of code used by different testing configurations used by Selenium and Watir. Tools used are Watir and C#, Java, Python and Ruby bindings for Selenium. These tools are paired with browsers Google Chrome, Internet Explorer, Mozilla Firefox and Opera for a total of 20 benchmarked configurations. Results of this study show that by selecting efficient pieces to the configuration in the form of tools, language bindings and web browsers, increased performance can be achieved as shorter execution times, less used memory and more concise code. Even for the purposes of cross-browser testing, the selection of tool affects the performance of testing configuration

    Chat activity is a better predictor than chat sentiment on software developers productivity

    No full text
    Abstract Recent works have proposed that software developers’ positive emotion has a positive impact on software developers’ productivity. In this paper we investigate two data sources: developers chat messages (from Slack and Hipchat) and source code commits of a single co-located Agile team over 200 working days. Our regression analysis shows that the number of chat messages is the best predictor and predicts productivity measured both in the number of commits and lines of code with R2 of 0.33 and 0.27 respectively. We then add sentiment analysis variables until AIC of our model no longer improves and gets R2 values of 0.37 (commits) and 0.30 (lines of code). Thus, analyzing chat sentiment improves productivity prediction over chat activity alone but the difference is not massive. This work supports the idea that emotional state and productivity are linked in software development. We find that three positive sentiment metrics, but surprisingly also one negative sentiment metric is associated with higher productivity

    R Scripts for The Evolution of Sentiment Analysis - A Review of Research Topics, Venues, and Top Cited Papers

    No full text
    R scripts for "The Evolution of Sentiment Analysis - A Review of Research Topics, Venues, and Top Cited Papers" by Mika V. Mäntylä, Daniel Graziotin, and Miikka Kuutila.<div><br></div><div>Because of Scopus TOS, we could regrettably not provide the data that these scripts take as input.</div

    What do we know about time pressure in software development?

    No full text
    Abstract Time Pressure means that time experienced by an individual is scarce in relation to the task demands at hand. In this article, we summarize findings and provide practitioner takeaways based on a systematic review of existing literature. We find that most empirical evidence supports reduced quality, increased productivity, and negative effects on individuals under time pressure. Time pressure is caused by company culture, poor effort estimates, and project management. The effects of time pressure can be explained by Challenge and Hindrance time pressure, the Yerkes-Dodson Law and The Job Demands-Resources model. Finally we conclude the article by giving practitioner takeaways related to minimizing the negative effects of time pressure
    corecore