14,104 research outputs found
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Recursive SDN for Carrier Networks
Control planes for global carrier networks should be programmable (so that
new functionality can be easily introduced) and scalable (so they can handle
the numerical scale and geographic scope of these networks). Neither
traditional control planes nor new SDN-based control planes meet both of these
goals. In this paper, we propose a framework for recursive routing computations
that combines the best of SDN (programmability) and traditional networks
(scalability through hierarchy) to achieve these two desired properties.
Through simulation on graphs of up to 10,000 nodes, we evaluate our design's
ability to support a variety of routing and traffic engineering solutions,
while incorporating a fast failure recovery mechanism
Fine Grained Component Engineering of Adaptive Overlays: Experiences and Perspectives
Recent years have seen significant research being carried out into peer-to-peer (P2P) systems. This work has focused on the styles and applications of P2P computing, from grid computation to content distribution; however, little investigation has been performed into how these systems are built. Component based engineering is an approach that has seen successful deployment in the field of middleware development; functionality is encapsulated in âbuilding blocksâ that can be dynamically plugged together to form complete systems. This allows efficient, flexible and adaptable systems to be built with lower overhead and development complexity. This paper presents an investigation into the potential of using component based engineering in the design and construction of peer-to-peer overlays. It is highlighted that the quality of these properties is dictated by the component architecture used to implement the system. Three reusable decomposition architectures are designed and evaluated using Chord and Pastry case studies. These demonstrate that significant improvements can be made over traditional design approaches resulting in much more reusable, (re)configurable and extensible systems
ScaRR: Scalable Runtime Remote Attestation for Complex Systems
The introduction of remote attestation (RA) schemes has allowed academia and
industry to enhance the security of their systems. The commercial products
currently available enable only the validation of static properties, such as
applications fingerprint, and do not handle runtime properties, such as
control-flow correctness. This limitation pushed researchers towards the
identification of new approaches, called runtime RA. However, those mainly work
on embedded devices, which share very few common features with complex systems,
such as virtual machines in a cloud. A naive deployment of runtime RA schemes
for embedded devices on complex systems faces scalability problems, such as the
representation of complex control-flows or slow verification phase.
In this work, we present ScaRR: the first Scalable Runtime Remote attestation
schema for complex systems. Thanks to its novel control-flow model, ScaRR
enables the deployment of runtime RA on any application regardless of its
complexity, by also achieving good performance. We implemented ScaRR and tested
it on the benchmark suite SPEC CPU 2017. We show that ScaRR can validate on
average 2M control-flow events per second, definitely outperforming existing
solutions.Comment: 14 page
Recommended from our members
DESIGN AND IMPLEMENTATION OF PATH FINDING AND VERIFICATION IN THE INTERNET
In the Internet, network traffic between endpoints typically follows one path that is determined by the control plane. Endpoints have little control over the choice of which path their network traffic takes and little ability to verify if the traffic indeed follows a specific path. With the emergence of software-defined networking (SDN), more control over connections can be exercised, and thus the opportunity for novel solutions exists. However, there remain concerns about the attack surface exposed by fine-grained control, which may allow attackers to inject and redirect traffic.
To address these opportunities and concerns, we consider two specific challenges: (1) How can the network determine the choices of paths available to connect endpoints, especially when multiple criteria can be considered? And (2) how can endpoints verify the integrity of the path over which network traffic is sent. The latter consists of two subproblems, determining that the source of traffic is authentic and determining that a specified path is traversed without deviation. In this dissertation, we investigate and present solutions for both the network path finding problem and the verification problem.
We first address path finding, or routing, which is a core functionality in the Internet. Existing approaches are either based on a single criterion (such as path length, delay, or an artificially defined ``weightââ) or use a combinatorial optimization function when there are multiple criteria. We present a multi-criteria routing algorithm that can search the whole space of all possible paths. To achieve the scalability of our solution, we limit the search to only Pareto-optimal paths, which allows us to prune sub-optimal paths quickly and reduce computational complexity. We show that our approach is tractable on a variety of realistic topologies and the results Pareto-optimal paths can be clustered to present a few alternative options.
We then address path verification in the Internet, which consists of source authentication and path validation. Once a path has been selected, we show that an endpoint can validate that traffic indeed traverses along the chosen path. Prior work has relied on cryptographic approaches for such validation, which need significant computational resources. In contrast, we propose a lightweight and scalable technique to address this problem, which uses a set of orthogonal sequences as credentials in the packets. The verification of these orthogonal credentials is based on inner product computations, which can be easily implemented by basic bitwise operations in a processor. We show that the proposed approach can achieve the necessary security properties for both source authentication and path validation. Results from a prototype implementation show that the proposed technique can be implemented efficiently and only add a small computational overhead.
The results of our work enable novel uses of networks with fine-grained traffic control, such as enabling more path choices in networks where multiple performance criteria matter. In addition, our work contributes to efforts to make the Internet more secure by presenting techniques that allow endpoints to validate the source and path of network traffic. We believe that these contributions help with improving both the current Internet and also future networks
- âŠ