9,894 research outputs found

    An Event-based Analysis Framework for Open Source Software Development Projects

    Get PDF
    The increasing popularity and success of Open Source Software (OSS) development projects has drawn significant attention of academics and open source participants over the last two decades. As one of the key areas in OSS research, assessing and predicting OSS performance is of great value to both OSS communities and organizations who are interested in investing in OSS projects. Most existing research, however, has considered OSS project performance as the outcome of static cross-sectional factors such as number of developers, project activity level, and license choice. While variance studies can identify some predictors of project outcomes, they tend to neglect the actual process of development. Without a closer examination of how events occur, an understanding of OSS projects is incomplete. This dissertation aims to combine both process and variance strategy, to investigate how OSS projects change over time through their development processes; and to explore how these changes affect project performance. I design, instantiate, and evaluate a framework and an artifact, EventMiner, to analyze OSS projects’ evolution through development activities. This framework integrates concepts from various theories such as distributed cognition (DCog) and complexity theory, applying data mining techniques such as decision trees, motif analysis, and hidden Markov modeling to automatically analyze and interpret the trace data of 103 OSS projects from an open source repository. The results support the construction of process theories on OSS development. The study contributes to literature in DCog, design routines, OSS development, and OSS performance. The resulting framework allows OSS researchers who are interested in OSS development processes to share and reuse data and data analysis processes in an open-source manner

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

    A study of code change patterns for adaptive maintenance with AST analysis

    Get PDF
    Example-based transformational approaches to automate adaptive maintenance changes plays an important role in software research. One primary concern of those approaches is that a set of good qualified real examples of adaptive changes previously made in the history must be identified, or otherwise the adoption of such approaches will be put in question. Unfortunately, there is rarely enough detail to clearly direct transformation rule developers to overcome the barrier of finding qualified examples for adaptive changes. This work explores the histories of several open source systems to study the repetitiveness of adaptive changes in software evolution, and hence recognizing the source code change patterns that are strongly related with the adaptive maintenance. We collected the adaptive commits from the history of numerous open source systems, then we obtained the repetitiveness frequencies of source code changes based on the analysis of Abstract Syntax Tree (AST) edit actions within an adaptive commit. Using the prevalence of the most common adaptive changes, we suggested a set of change patterns that seem correlated with adaptive maintenance. It is observed that 76.93% of the undertaken adaptive changes were represented by 12 AST code differences. Moreover, only 9 change patterns covered 64.69% to 76.58% of the total adaptive change hunks in the examined projects. The most common individual patterns are related to initializing objects and method calls changes. A correlation analysis on examined projects shows that they have very similar frequencies of the patterns correlated with adaptive changes. The observed repeated adaptive changes could be useful examples for the construction of transformation approache

    Online division of labour: emergent structures in Open Source Software

    Get PDF
    The development Open Source Software fundamentally depends on the participation and commitment of volunteer developers to progress on a particular task. Several works have presented strategies to increase the on-boarding and engagement of new contributors, but little is known on how these diverse groups of developers self-organise to work together. To understand this, one must consider that, on one hand, platforms like GitHub provide a virtually unlimited development framework: any number of actors can potentially join to contribute in a decentralised, distributed, remote, and asynchronous manner. On the other, however, it seems reasonable that some sort of hierarchy and division of labour must be in place to meet human biological and cognitive limits, and also to achieve some level of efficiency. These latter features (hierarchy and division of labour) should translate into detectable structural arrangements when projects are represented as developer-file bipartite networks. Thus, in this paper we analyse a set of popular open source projects from GitHub, placing the accent on three key properties: nestedness, modularity and in-block nestedness -which typify the emergence of heterogeneities among contributors, the emergence of subgroups of developers working on specific subgroups of files, and a mixture of the two previous, respectively. These analyses show that indeed projects evolve into internally organised blocks. Furthermore, the distribution of sizes of such blocks is bounded, connecting our results to the celebrated Dunbar number both in off- and on-line environments. Our conclusions create a link between bio-cognitive constraints, group formation and online working environments, opening up a rich scenario for future research on (online) work team assembly (e.g. size, composition, and formation). From a complex network perspective, our results pave the way for the study of time-resolved datasets, and the design of suitable models that can mimic the growth and evolution of OSS projects

    Knowing the Biosphere: Documentation, Specimens, Archives, and Names Reveal Environmental Change and Emerging Pathogens

    Get PDF
    One Health programs and trajectories are now the apparent standard for exploring the occurrence and distribution of emerging pathogens and disease. By definition, One Health has been characterized as a broadly inclusive, collaborative, and transdisciplinary approach with connectivity across local to global scales, which integrates the medical and veterinary community to recognize health outcomes emerging at the environmental nexus for people, animals, plants, and their shared landscapes. One Health has been an incomplete model, conceptually and operationally, focused on reactive and response-based foundations, to limit the impact of emerging pathogens and emerging infectious diseases and, as such, lacks a powerful proactive capacity. A proactive, predictive One Health is necessary, emanating in part from geographically/taxonomically broad and temporally deep biological collections of pathogen-host assemblages. The DAMA protocol (Document, Assess, Monitor, Act), the operational extension of the Stockholm paradigm (SP), accomplishes this task by encompassing holistic and strategic biological sampling of reservoir host assemblages and pathogens at environmental interfaces and more extensively through resurveys, with development of informatics resources digitally linked to physical specimens held in publicly accessible museum biorepositories. Archives of specimens are the foundations for accumulating interrelated archives of information (the baselines against which change can be identified and tracked), with collections serving as fundamental resources for biodiversity informatics under the conceptual evolutionary and ecological umbrella of the SP. A cultural and conceptual transformation is essential among the diverse practitioners in the One Health community, one that recognizes the necessity of placing pathogens in an evolutionary, ecological, and environmental context by integrating specimens and associated informatics into an infrastructure and networks for actionable information. As a community, it is essential to abandon response-based business as usual while looking forward toward proactive transboundary approaches that maximize our conceptual and taxonomic view of diversity across interconnected planetary scales that influence the complexity of pathogen-host interfaces. Evolution, where the past always influences the present and the future, defines our trajectory, as the need for sustained archives that describe the biosphere becomes more acute with each passing day
    • …
    corecore