464,441 research outputs found

    Going farther together:the impact of social capital on sustained participation in open source

    Get PDF
    Sustained participation by contributors in open-source software is critical to the survival of open-source projects and can provide career advancement benefits to individual contributors. However, not all contributors reap the benefits of open-source participation fully, with prior work showing that women are particularly underrepresented and at higher risk of disengagement. While many barriers to participation in open-source have been documented in the literature, relatively little is known about how the social networks that open-source contributors form impact their chances of long-term engagement. In this paper we report on a mixed-methods empirical study of the role of social capital (i.e., the resources people can gain from their social connections) for sustained participation by women and men in open-source GitHub projects. After combining survival analysis on a large, longitudinal data set with insights derived from a user survey, we confirm that while social capital is beneficial for prolonged engagement for both genders, women are at disadvantage in teams lacking diversity in expertise.\u3cbr/\u3

    Essays on exploitation and exploration in software development

    Get PDF
    Software development includes two types of activities: software improvement activities by correcting faults and software enhancement activities by adding new features. Based on organizational theory, we propose that these activities can be classified as implementation-oriented (exploitation) and innovation-oriented (exploration). In the context of open source software (OSS) development, developing a patch would be an example of an exploitation activity. Requesting a new software feature would be an example of an exploration activity. This dissertation consists of three essays which examine exploitation and exploration in software development. The first essay analyzes software patch development (exploitation) in the context of software vulnerabilities which could be exploited by hackers. There is a need for software vendors to make software patches available in a timely manner for vulnerabilities in their products. We develop a survival analysis model of the patch release behavior of software vendors based on a cost-based framework of software vendors. We test this model using a data set compiled from the National Vulnerability Database (NVD), United States Computer Emergency Readiness Team (US-CERT), and vendor web sites. Our results indicate that vulnerabilities with high confidentiality impact or high integrity impact are patched faster than vulnerabilities with high availability impact. Interesting differences in the patch release behavior of software vendors based on software type (new release vs. update) and type of vendor (open source vs. proprietary) are found. The second essay studies exploitation and exploration in the content of OSS development. We empirically examine the differences between exploitation (patch development) and exploration (feature request) networks of developers in OSS projects in terms of their social network structure, using a data set collected from the SourceForge database. We identify a new category of developers (ambidextrous developers) in OSS projects who contribute to patch development as well as feature request activities. Our results indicate that a patch development network has greater internal cohesion and network centrality than a feature request network. In contrast, a feature request network has greater external connectivity than a patch development network. The third essay explores ambidexterity and ambidextrous developers in the context of OSS project performance. Recent research on OSS development has studied the social network structure of software developers as a determinant of project success. However, this stream of research has focused on the project level, and has not recognized the fact that software projects could consist of different types of activities, each of which could require different types of expertise and network structures. We develop a theoretical construct for ambidexterity based on the concept of ambidextrous developers. We empirically illustrate the effects of ambidexterity and network characteristics on OSS project performance. Our results indicate that a moderate level of ambidexterity, external cohesion, and technological diversity are desirable for project success. Project success is also positively related to internal cohesion and network centrality. We illustrate the roles of ambidextrous developers on project performance and their differences compared to other developers

    Towards an Intelligent System for Software Traceability Datasets Generation

    Get PDF
    Software datasets and artifacts play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. Software artifacts, other than source code and issue tracking entities, can also provide a great deal of insight into a software system and facilitate knowledge sharing and information reuse. The diversity and quality of the datasets and artifacts within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. In this dissertation, we report our empirical work that aims to automatically generate and assess the quality of such datasets. Our goal is to introduce an intelligent system that can help researchers in the domain of software traceability in obtaining high-quality “training sets”, “testing sets” or appropriate “case studies” from open source repositories based on their needs. In the first project, we present a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Second, this dissertation introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used. Third, we present the results of an empirical study with limited scope to generate datasets using three baseline approaches for the creation of training data. These approaches are (i) Expert-Based, (ii) Automated Web-Mining, which generates training sets by automatically mining tactic\u27s APIs from technical programming websites, and lastly, (iii) Automated Big-Data Analysis, which mines ultra-large-scale code repositories to generate training sets. We compare the trace-link creation accuracy achieved using each of these three baseline approaches and discuss the costs and benefits associated with them. Additionally, in a separate study, we investigate the impact of training set size on the accuracy of recovering trace links. Finally, we conduct a large-scale study to identify which types of software artifacts are produced by a wide variety of open-source projects at different levels of granularity. Then we propose an automated approach based on Machine Learning techniques to identify various types of software artifacts. Through a set of experiments, we report and compare the performance of these algorithms when applied to software artifacts. Finally, we conducted a study to understand how software traceability experts and practitioners evaluate the quality of their datasets. In addition, we aim at gathering experts’ opinions on all quality attributes and metrics proposed by T-DQA

    Common Attack Surface Detection

    Get PDF
    In the current software development market, many software is being developed using a copy-paste mechanism with little to no change made to the reused code. Such a practice has the potential of causing severe security issues since one fragment of code containing a vulnerability may cause the same vulnerability to appear in many other software with the same cloned fragment. The concept of relying on software diversity for security may also be compromised by such a trend, since seemingly different software may in fact share vulnerable code fragments. Although there exist efforts on detecting cloned code fragments, there lack solutions for formally characterizing the specific impact on security. In this thesis, we revisit the concept of software diversity from a security viewpoint. Specifically, we define the novel concept of common attack surface to model the relative degree to which a pair of software may be sharing potentially vulnerable code fragments. To implement the concept, we develop an automated tool, Dupsec, in order to efficiently identify common attack surface between any given pair of software applications with minimum human intervention. Finally, we conduct experiments by applying our tool to a large number of open source software. Our results demonstrate many seemingly unrelated real-world software indeed share significant common attack surface

    Advancing Protocol Diversity in Network Security Monitoring

    Get PDF
    With information technology entering new fields and levels of deployment, e.g., in areas of energy, mobility, and production, network security monitoring needs to be able to cope with those environments and their evolution. However, state-of-the-art Network Security Monitors (NSMs) typically lack the necessary flexibility to handle the diversity of the packet-oriented layers below the abstraction of TCP/IP connections. In this work, we advance the software architecture of a network security monitor to facilitate the flexible integration of lower-layer protocol dissectors while maintaining required performance levels. We proceed in three steps: First, we identify the challenges for modular packet-level analysis, present a refined NSM architecture to address them and specify requirements for its implementation. Second, we evaluate the performance of data structures to be used for protocol dispatching, implement the proposed design into the popular open-source NSM Zeek and assess its impact on the monitor performance. Our experiments show that hash-based data structures for dispatching introduce a significant overhead while array-based approaches qualify for practical application. Finally, we demonstrate the benefits of the proposed architecture and implementation by migrating Zeek\u27s previously hard-coded stack of link and internet layer protocols to the new interface. Furthermore, we implement dissectors for non-IP based industrial communication protocols and leverage them to realize attack detection strategies from recent applied research. We integrate the proposed architecture into the Zeek open-source project and publish the implementation to support the scientific community as well as practitioners, promoting the transfer of research into practice

    Empirical research on the evaluation model and method of sustainability of the open source ecosystem

    Get PDF
    The development of open source brings new thinking and production modes to software engineering and computer science, and establishes a software development method and ecological environment in which groups participate. Regardless of investors, developers, participants, and managers, they are most concerned about whether the Open Source Ecosystem can be sustainable to ensure that the ecosystem they choose will serve users for a long time. Moreover, the most important quality of the software ecosystem is sustainability, and it is also a research area in Symmetry. Therefore, it is significant to assess the sustainability of the Open Source Ecosystem. However, the current measurement of the sustainability of the Open Source Ecosystem lacks universal measurement indicators, as well as a method and a model. Therefore, this paper constructs an Evaluation Indicators System, which consists of three levels: The target level, the guideline level and the evaluation level, and takes openness, stability, activity, and extensibility as measurement indicators. On this basis, a weight calculation method, based on information contribution values and a Sustainability Assessment Model, is proposed. The models and methods are used to analyze the factors affecting the sustainability of Stack Overflow (SO) ecosystem. Through the analysis, we find that every indicator in the SO ecosystem is partaking in different development trends. The development trend of a single indicator does not represent the sustainable development trend of the whole ecosystem. It is necessary to consider all of the indicators to judge that ecosystem’s sustainability. The research on the sustainability of the Open Source Ecosystem is helpful for judging software health, measuring development efficiency and adjusting organizational structure. It also provides a reference for researchers who study the sustainability of software engineering

    Empirical Evidence of Large-Scale Diversity in API Usage of Object-Oriented Software

    Get PDF
    In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on "usage diversity'", defined as the different statically observable combinations of methods called on the same object. We present empirical evidence that there is a significant usage diversity for many classes. For instance, we observe in our dataset that Java's String is used in 2460 manners. We discuss the reasons of this observed diversity and the consequences on software engineering knowledge and research
    • …
    corecore