238 research outputs found

    Process Models for Learning Patterns in FLOSS Repositories

    Get PDF
    Evidence suggests that Free/Libre Open Source Software (FLOSS) environments provide unlimited learning opportunities. Community members engage in a number of activities both during their interaction with their peers and while making use of these environments’ repositories. To date, numerous studies document the existence of learning processes in FLOSS through surveys or by means of questionnaires filled by FLOSS projects participants. At the same time, there is a surge in developing tools and techniques for extracting and analyzing data from different FLOSS data sources that has birthed a new field called Mining Software Repositories (MSR). In spite of these growing tools and techniques for mining FLOSS repositories, there is limited or no existing approaches to providing empirical evidence of learning processes directly from these repositories. Therefore, in this work we sought to trigger such an initiative by proposing an approach based on Process Mining. With this technique, we aim to trace learning behaviors from FLOSS participants’ trails of activities as recorded in FLOSS repositories. We identify the participants as Novices and Experts. A Novice is defined as any FLOSS member that benefits from a learning experience through acquiring new skills while the Expert is the provider of these skills. The significance of our work is mainly twofold. First and foremost, we extend the MSR field by showing the potential of mining FLOSS repositories by applying Process Mining techniques. Lastly, our work provides critical evidence that boosts the understanding of learning behavior in FLOSS communities by analyzing the relevant repositories. In order to accomplish this, we have proposed and implemented a methodology that follows a seven-step approach including developing an appropriate terminology or ontology for learning processes in FLOSS, contextualizing learning processes through a-priori models, generating Event Logs, generating corresponding process models, interpreting and evaluating the value of process discovery, performing conformance analysis and verifying a number of formulated hypotheses with regard to tracing learning patterns in FLOSS communities. The implementation of this approach has resulted in the development of the Ontology of Learning in FLOSS (OntoLiFLOSS) environments that defines the terms needed to describe learning processes in FLOSS as well as providing a visual representation of these processes through Petri net-like Workflow nets. Moreover, another novelty pertains to the mining of FLOSS repositories by defining and describing the preliminaries required for preprocessing FLOSS data before applying Process Mining techniques for analysis. Through a step-by-step process, we effectively detail how the Event Logs are constructed through generating key phrases and making use of Semantic Search. Taking a FLOSS environment called Openstack as our data source, we apply our proposed techniques to identify learning activities based on key phrases catalogs and classification rules expressed through pseudo code as well as the appropriate Process Mining tool. We thus produced Event Logs that are based on the semantic content of messages in Openstack’s Mailing archives, Internet Relay Chat (IRC) messages, Reviews, Bug reports and Source code to retrieve the corresponding activities. Considering these repositories in light of the three learning process phases (Initiation, Progression and maturation), we produced an Event Log for each participant (Novice or Expert) in every phase on the corresponding dataset. Hence, we produced 14 Event Logs that helped build 14 corresponding process maps which are visual representation of the flow occurrence of learning activities in FLOSS for each participant. These process maps provide critical indications that speak volumes in terms of the presence of learning processes in the analyzed repositories. The results show that learning activities do occur at a significant rate during messages exchange on both Mailing archives and IRC messages. The slight differences between the two datasets can be highlighted in two ways. First, the involvement of Experts is more on iv IRC than it is on Mailing archives with 7.22% and 0.36% of Expert involvement respectively on IRC forums and Mailing lists. This can be justified by the differences in the length of messages sent on these two datasets. The average length of sent messages is 3261 characters for an email compared to 60 characters for a chat message. The evidence produced from this mining experiment solidifies the finding in terms of the existence of learning processes in FLOSS as well as the scale at which they occur. While the Initiation phase shows the Novice as the most involved in the start of the learning process, during Progression phase the involvement of the Expert can be seen to be significantly increasing. In order to trace the advanced skills in the Maturation phase, we look at repositories that store data about developing, creating code, examining and reviewing the code, identifying and fixing possible bugs. Therefore, we consider three repositories including Source Code, Bug reports and Reviews. The results obtained in this phase largely justify the choice of these three datasets to track learning behavior at this stage. Both the Bug reports and the Source code demonstrate the commitment of the Novice to seek answers and interact as much as possible in strengthening the acquired skills. With a participation of 49.22% for the Novice against 46.72% for the Expert and 46.19 % against 42.04% respectively on Bug reports and Source code, the Novice still engages significantly in learning. On the last dataset, Reviews, we notice an increase in the Expert’s role. The Expert performs activities to the tune of 40.36 % of total number of activities against 22.17 % for the Novice. The last steps of our methodology steer the comparison of the defined a-priori models with final models that describe how learning processes occur according to the actual behavior from Event Logs. Our attempts to producing process models start with depicting process maps to track the actual behaviour as it occurs in Openstack repositories, before concluding with final Petri net models representative of learning processes in FLOSS as a result of conformance analysis. For every dataset in the corresponding learning phase, we produce 3 process maps respectively depicting the overall learning behaviour for all FLOSS community members (Novice or Expert together), then the Novice and Expert. In total, we produced 21 process maps, empirically describing process models on real data, 14 process models in the form of Petri nets for every participant on each dataset. We make use of the Artificial Immune System (AIS) algorithms to merge the 14 Event Logs that uniquely capture the behaviour of every participant on different datasets in the three phases. We then reanalyze the resulting logs in order to produce 6 global models that inclusively provide a comprehensive depiction of participants’ learning behavior in FLOSS communities. This description hints that Workflow nets introduced as our a-priori models give rather a more simplistic representation of learning processes in FLOSS. Nevertheless, our experiments with Event Logs starting from process discovery to conformance checking from Openstack repositories demonstrate that the real learning behaviors are more complete and most importantly largely submerge these simplistic a-priori models. Finally, our methodology has proved to be effective in both providing a novel alternative for mining FLOSS repositories and providing empirical evidence that describes how knowledge is exchanged in FLOSS environments. Moreover, our results enrich the MSR field by providing a reproducible step-by-step problem solving approach that can be customized to answer subsequent research questions in FLOSS repositories using Process Mining

    Decision Point Analysis on Learning Process Models in FLOSS mailing Archives

    Get PDF
    Abstract. Numerous studies continue to explore the potential of social interactions between people in Free/Libre Open Source Software (FLOSS) environments. While the dynamics of interactions in these environments can be understood from different perspectives, we put a particular focus on any interactions resulting in knowledge transfer and acquisition. As learning platforms, FLOSS communities provide immense opportunities for improving software engineering skills. People who engage in FLOSS activities both acquire and improve their software development skills. For this reason, it is very helpful to understand how these learning interactions occur. In this paper, we make use of the decision miner in process mining to conduct our analysis. The purpose of such an endeavour is twofold. Firstly, we provide empirical insights into how people learn while exchanging emails in FLOSS mailing archives. Lastly, we go a step further by providing insights behind the motivation into learning participants' decisions on their learning paths

    Mining Educational Social Network Structures from FLOSS Repositories

    Get PDF
    FLOSS environments have been proved to provide an interesting learning platform for software engineers. Research suggests that people partaking in both technical and non-technical activities in FLOSS prjects are more likely to positively improve their software engineering skills. To this end, there are propositions to involve computer science and software engineering students in formal higher institutions of learning, in participating in FLOSS projects in order to give them an opportunity to develop their programming capacity by working on real-life projects. While some empirical studies have been conducted to provide some lights on learning processes in FLOSS environments, there is limited or no work done pertaining to understanding social structures during this process of knowledge transfer and acquisition. In this paper, we make use of social network analysis techniques in order to provide insights related to the emerging of social structures from FLOSS repositories from an educational point of view. We hope that these educational structures will enhance both the understanding with regards to how learning occurs in these communities and especially, the frequency of participants' involvement that culminates into learning

    Mining Educational Social Network Structures from FLOSS Repositories

    Get PDF
    FLOSS environments have been proved to provide an interesting learning platform for software engineers. Research suggests that people partaking in both technical and non-technical activities in FLOSS prjects are more likely to positively improve their software engineering skills. To this end, there are propositions to involve computer science and software engineering students in formal higher institutions of learning, in participating in FLOSS projects in order to give them an opportunity to develop their programming capacity by working on real-life projects. While some empirical studies have been conducted to provide some lights on learning processes in FLOSS environments, there is limited or no work done pertaining to understanding social structures during this process of knowledge transfer and acquisition. In this paper, we make use of social network analysis techniques in order to provide insights related to the emerging of social structures from FLOSS repositories from an educational point of view. We hope that these educational structures will enhance both the understanding with regards to how learning occurs in these communities and especially, the frequency of participants' involvement that culminates into learning

    Learning and Activity Patterns in OSS Communities and their Impact on Software Quality

    Get PDF
    This paper presents a framework to identify and analyse learning and activity patterns that characterise participation and collaboration of individuals in Open Source Software (OSS) communities.  It first describes how participants’ activities enable and drive a learning process that occurs in individual participants as well as in the OSS project community as a whole. It then explores how to identify and analyse learning patterns at both individual level and community level. The objective of such analysis is to determine the impact of these patterns on the quality of the OSS product and define a descriptive approach to quality that is concerned less with standards than with the facts of OSS peer-review and peer-production

    FLOSSSim: Understanding the Free/Libre Open Source Software (FLOSS) Development Process through Agent-Based Modeling

    Get PDF
    abstract: Free/Libre Open Source Software (FLOSS) is the product of volunteers collaborating to build software in an open, public manner. The large number of FLOSS projects, combined with the data that is inherently archived with this online process, make studying this phenomenon attractive. Some FLOSS projects are very functional, well-known, and successful, such as Linux, the Apache Web Server, and Firefox. However, for every successful FLOSS project there are 100's of projects that are unsuccessful. These projects fail to attract sufficient interest from developers and users and become inactive or abandoned before useful functionality is achieved. The goal of this research is to better understand the open source development process and gain insight into why some FLOSS projects succeed while others fail. This dissertation presents an agent-based model of the FLOSS development process. The model is built around the concept that projects must manage to attract contributions from a limited pool of participants in order to progress. In the model developer and user agents select from a landscape of competing FLOSS projects based on perceived utility. Via the selections that are made and subsequent contributions, some projects are propelled to success while others remain stagnant and inactive. Findings from a diverse set of empirical studies of FLOSS projects are used to formulate the model, which is then calibrated on empirical data from multiple sources of public FLOSS data. The model is able to reproduce key characteristics observed in the FLOSS domain and is capable of making accurate predictions. The model is used to gain a better understanding of the FLOSS development process, including what it means for FLOSS projects to be successful and what conditions increase the probability of project success. It is shown that FLOSS is a producer-driven process, and project factors that are important for developers selecting projects are identified. In addition, it is shown that projects are sensitive to when core developers make contributions, and the exhibited bandwagon effects mean that some projects will be successful regardless of competing projects. Recommendations for improving software engineering in general based on the positive characteristics of FLOSS are also presented.Dissertation/ThesisPh.D. Computer Science 201

    Adopting Free/Libre/Open Source Software Practices, Techniques and Methods for Industrial Use

    Get PDF
    Today’s software companies face the challenges of highly distributed development projects and constantly changing requirements. This paper proposes the adoption of relevant Free/Libre/Open Source Software (FLOSS) practices in order to improve software development projects in industry. Many FLOSS projects have proven to be very successful, producing high quality products with steady and frequent releases. This study aims to identify FLOSS practices that can be adapted for the corporate environment. To achieve this goal, a framework to compare FLOSS and industrial development methodologies was created. Three successful FLOSS projects were selected as study targets (the Linux Kernel, the FreeBSD operating system, and the JBoss application server), as well as two projects from Ericsson, a large telecommunications company. Based on an analysis of these projects, FLOSS best practices were tailored to fit industrial development environments. The final results consisted of a set of key adoption opportunities that aimed to improve software quality and overall development productivity by importing best practices from the FLOSS environment. The adoption opportunities were then validated at three large corporations

    Evidence-based Software Process Recovery

    Get PDF
    Developing a large software system involves many complicated, varied, and inter-dependent tasks, and these tasks are typically implemented using a combination of defined processes, semi-automated tools, and ad hoc practices. Stakeholders in the development process --- including software developers, managers, and customers --- often want to be able to track the actual practices being employed within a project. For example, a customer may wish to be sure that the process is ISO 9000 compliant, a manager may wish to track the amount of testing that has been done in the current iteration, and a developer may wish to determine who has recently been working on a subsystem that has had several major bugs appear in it. However, extracting the software development processes from an existing project is expensive if one must rely upon manual inspection of artifacts and interviews of developers and their managers. Previously, researchers have suggested the live observation and instrumentation of a project to allow for more measurement, but this is costly, invasive, and also requires a live running project. In this work, we propose an approach that we call software process recovery that is based on after-the-fact analysis of various kinds of software development artifacts. We use a variety of supervised and unsupervised techniques from machine learning, topic analysis, natural language processing, and statistics on software repositories such as version control systems, bug trackers, and mailing list archives. We show how we can combine all of these methods to recover process signals that we map back to software development processes such as the Unified Process. The Unified Process has been visualized using a time-line view that shows effort per parallel discipline occurring across time. This visualization is called the Unified Process diagram. We use this diagram as inspiration to produce Recovered Unified Process Views (RUPV) that are a concrete version of this theoretical Unified Process diagram. We then validate these methods using case studies of multiple open source software systems

    Analysis and visualization of multimodal socio-technical information of free/libre and open source software (FLOSS) Projects

    Get PDF
    Personality traits influence most, if not all, of the human activities, from those as natural as the way people walk, talk, dress and write to those most complex as the way they interact with others. Most importantly, personality influences the way people make decisions including, in the case of developers, the criteria they consider when selecting a software project they want to participate. Most of the works that study the influence of social, technical and human factors in software development projects have been focused on the impact of communications in software quality. For instance, on identifying predictors to detect files that may contain bugs before releasing an enhanced version of a software product. Only a few of these works focus on the analysis of personality traits of developers with commit permissions (committers) in Free/Libre and Open-Source Software (FLOSS) projects and their relationship with the software artifacts they interact with. This thesis presents an approach, based on the automatic recognition of personality traits from e-mails sent by committers in FLOSS projects, to uncover relationships between the social and technical aspects that occur during software development processes. Experimental results suggest the existence of some relationships among personality traits projected by the committers through their e-mails and the social (communication) and technical activities they undertake.MaestrĂ­
    • …
    corecore