14,104 research outputs found

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Recursive SDN for Carrier Networks

    Full text link
    Control planes for global carrier networks should be programmable (so that new functionality can be easily introduced) and scalable (so they can handle the numerical scale and geographic scope of these networks). Neither traditional control planes nor new SDN-based control planes meet both of these goals. In this paper, we propose a framework for recursive routing computations that combines the best of SDN (programmability) and traditional networks (scalability through hierarchy) to achieve these two desired properties. Through simulation on graphs of up to 10,000 nodes, we evaluate our design's ability to support a variety of routing and traffic engineering solutions, while incorporating a fast failure recovery mechanism

    Fine Grained Component Engineering of Adaptive Overlays: Experiences and Perspectives

    Get PDF
    Recent years have seen significant research being carried out into peer-to-peer (P2P) systems. This work has focused on the styles and applications of P2P computing, from grid computation to content distribution; however, little investigation has been performed into how these systems are built. Component based engineering is an approach that has seen successful deployment in the field of middleware development; functionality is encapsulated in ‘building blocks’ that can be dynamically plugged together to form complete systems. This allows efficient, flexible and adaptable systems to be built with lower overhead and development complexity. This paper presents an investigation into the potential of using component based engineering in the design and construction of peer-to-peer overlays. It is highlighted that the quality of these properties is dictated by the component architecture used to implement the system. Three reusable decomposition architectures are designed and evaluated using Chord and Pastry case studies. These demonstrate that significant improvements can be made over traditional design approaches resulting in much more reusable, (re)configurable and extensible systems

    ScaRR: Scalable Runtime Remote Attestation for Complex Systems

    Full text link
    The introduction of remote attestation (RA) schemes has allowed academia and industry to enhance the security of their systems. The commercial products currently available enable only the validation of static properties, such as applications fingerprint, and do not handle runtime properties, such as control-flow correctness. This limitation pushed researchers towards the identification of new approaches, called runtime RA. However, those mainly work on embedded devices, which share very few common features with complex systems, such as virtual machines in a cloud. A naive deployment of runtime RA schemes for embedded devices on complex systems faces scalability problems, such as the representation of complex control-flows or slow verification phase. In this work, we present ScaRR: the first Scalable Runtime Remote attestation schema for complex systems. Thanks to its novel control-flow model, ScaRR enables the deployment of runtime RA on any application regardless of its complexity, by also achieving good performance. We implemented ScaRR and tested it on the benchmark suite SPEC CPU 2017. We show that ScaRR can validate on average 2M control-flow events per second, definitely outperforming existing solutions.Comment: 14 page
    • 

    corecore