937 research outputs found
On Analyzing the Topology of Commit Histories in Decentralized Version Control Systems
update for BASE on Sep 08 2018 22:43:36International audienceEmpirical analysis of software repositories usually deals with linear histories derived from centralized versioning systems. Decentralized version control systems allow a much richer structure of commit histories, which presents features that are typical of complex graph models. In this paper we bring some evidences of how the very structure of these commit histories carries relevant information about the distributed development process. By means of a novel data structure that we formally define, we analyze the topological characteristics of commit graphs of a sample of git projects. Our findings point out the existence of common recurrent structural patterns which identically occur in different projects and can be consider building blocks of distributed collaborative development
Public Git Archive: a Big Code dataset for all
The number of open source software projects has been growing exponentially.
The major online software repository host, GitHub, has accumulated tens of
millions of publicly available Git version-controlled repositories. Although
the research potential enabled by the available open source code is clearly
substantial, no significant large-scale open source code datasets exist. In
this paper, we present the Public Git Archive -- dataset of 182,014
top-bookmarked Git repositories from GitHub. We describe the novel data
retrieval pipeline to reproduce it. We also elaborate on the strategy for
performing dataset updates and legal issues. The Public Git Archive occupies
3.0 TB on disk and is an order of magnitude larger than the current source code
datasets. The dataset is made available through HTTP and provides the source
code of the projects, the related metadata, and development history. The data
retrieval pipeline employs an optimized worker queue model and an optimized
archive format to efficiently store forked Git repositories, reducing the
amount of data to download and persist. Public Git Archive aims to open a
myriad of new opportunities for ``Big Code`` research
Purposes, concepts, misfits, and a redesign of git
Git is a widely used version control system that is powerful but complicated. Its complexity may not be an inevitable consequence of its power but rather evidence of flaws in its design. To explore this hypothesis, we analyzed the design of Git using a theory that identifies concepts, purposes, and misfits. Some well-known difficulties with Git are described, and explained as misfits in which underlying concepts fail to meet their intended purpose. Based on this analysis, we designed a reworking of Git (called Gitless) that attempts to
remedy these flaws. To correlate misfits with issues reported by users, we
conducted a study of Stack Overflow questions. And to determine whether users experienced fewer complications using Gitless in place of Git, we conducted a small user study. Results suggest our approach can be profitable in identifying, analyzing, and fixing design problems.SUTD-MIT International Design Centre (IDC
Recommended from our members
Distributed virtual environment scalability and security
Distributed virtual environments (DVEs) have been an active area of research and engineering for more than 20 years. The most widely deployed DVEs are network games such as Quake, Halo, and World of Warcraft (WoW), with millions of users and billions of dollars in annual revenue. Deployed DVEs remain expensive centralized implementations despite significant research outlining ways to distribute DVE workloads.
This dissertation shows previous DVE research evaluations are inconsistent with deployed DVE needs. Assumptions about avatar movement and proximity - fundamental scale factors - do not match WoW’s workload, and likely the workload of other deployed DVEs. Alternate workload models are explored and preliminary conclusions presented. Using realistic workloads it is shown that a fully decentralized DVE cannot be deployed to today’s consumers, regardless of its overhead.
Residential broadband speeds are improving, and this limitation will eventually disappear. When it does, appropriate security mechanisms will be a fundamental requirement for technology adoption.
A trusted auditing system (“Carbon”) is presented which has good security, scalability, and resource characteristics for decentralized DVEs. When performing exhaustive auditing, Carbon adds 27% network overhead to a decentralized DVE with a WoW-like workload. This resource consumption can be reduced significantly, depending upon the DVE’s risk tolerance.
Finally, the Pairwise Random Protocol (PRP) is described. PRP enables adversaries to fairly resolve probabilistic activities, an ability missing from most decentralized DVE security proposals.
Thus, this dissertations contribution is to address two of the obstacles for deploying research on decentralized DVE architectures. First, lack of evidence that research results apply to existing DVEs. Second, the lack of security systems combining appropriate security guarantees with acceptable overhead
IST Austria Thesis
The scalability of concurrent data structures and distributed algorithms strongly depends on
reducing the contention for shared resources and the costs of synchronization and communication. We show how such cost reductions can be attained by relaxing the strict consistency conditions required by sequential implementations. In the first part of the thesis, we consider relaxation in the context of concurrent data structures. Specifically, in data structures
such as priority queues, imposing strong semantics renders scalability impossible, since a correct implementation of the remove operation should return only the element with highest priority. Intuitively, attempting to invoke remove operations concurrently creates a race condition. This bottleneck can be circumvented by relaxing semantics of the affected data structure, thus allowing removal of the elements which are no longer required to have the highest priority. We prove that the randomized implementations of relaxed data structures provide provable guarantees on the priority of the removed elements even under concurrency. Additionally, we show that in some cases the relaxed data structures can be used to scale the classical algorithms which are usually implemented with the exact ones. In the second part, we study parallel variants of the stochastic gradient descent (SGD) algorithm, which distribute computation among the multiple processors, thus reducing the running time. Unfortunately, in order for standard parallel SGD to succeed, each processor has to maintain a local copy of the necessary model parameter, which is identical to the local copies of other processors; the overheads from this perfect consistency in terms of communication and synchronization can negate the speedup gained by distributing the computation. We show that the consistency conditions required by SGD can be relaxed, allowing the algorithm to be more flexible in terms of tolerating quantized communication, asynchrony, or even crash faults, while its convergence remains asymptotically the same
SoK: Decentralized Finance (DeFi) Attacks
Within just four years, the blockchain-based Decentralized Finance (DeFi)
ecosystem has accumulated a peak total value locked (TVL) of more than 253
billion USD. This surge in DeFi's popularity has, unfortunately, been
accompanied by many impactful incidents. According to our data, users,
liquidity providers, speculators, and protocol operators suffered a total loss
of at least 3.24 billion USD from Apr 30, 2018 to Apr 30, 2022. Given the
blockchain's transparency and increasing incident frequency, two questions
arise: How can we systematically measure, evaluate, and compare DeFi incidents?
How can we learn from past attacks to strengthen DeFi security?
In this paper, we introduce a common reference frame to systematically
evaluate and compare DeFi incidents, including both attacks and accidents. We
investigate 77 academic papers, 30 audit reports, and 181 real-world incidents.
Our data reveals several gaps between academia and the practitioners'
community. For example, few academic papers address "price oracle attacks" and
"permissonless interactions", while our data suggests that they are the two
most frequent incident types (15% and 10.5% correspondingly). We also
investigate potential defenses, and find that: (i) 103 (56%) of the attacks are
not executed atomically, granting a rescue time frame for defenders; (ii) SoTA
bytecode similarity analysis can at least detect 31 vulnerable/23 adversarial
contracts; and (iii) 33 (15.3%) of the adversaries leak potentially
identifiable information by interacting with centralized exchanges
Cloud-edge hybrid applications
Many modern applications are designed to provide interactions among users, including multi-
user games, social networks and collaborative tools. Users expect application response time to
be in the order of milliseconds, to foster interaction and interactivity.
The design of these applications typically adopts a client-server model, where all interac-
tions are mediated by a centralized component. This approach introduces availability and fault-
tolerance issues, which can be mitigated by replicating the server component, and even relying on
geo-replicated solutions in cloud computing infrastructures. Even in this case, the client-server
communication model leads to unnecessary latency penalties for geographically close clients and
high operational costs for the application provider.
This dissertation proposes a cloud-edge hybrid model with secure and ecient propagation
and consistency mechanisms. This model combines client-side replication and client-to-client
propagation for providing low latency and minimizing the dependency on the server infras-
tructure, fostering availability and fault tolerance. To realize this model, this works makes the
following key contributions.
First, the cloud-edge hybrid model is materialized by a system design where clients maintain
replicas of the data and synchronize in a peer-to-peer fashion, and servers are used to assist
clients’ operation. We study how to bring most of the application logic to the client-side, us-
ing the centralized service primarily for durability, access control, discovery, and overcoming
internetwork limitations.
Second, we dene protocols for weakly consistent data replication, including a novel CRDT
model (∆-CRDTs). We provide a study on partial replication, exploring the challenges and
fundamental limitations in providing causal consistency, and the diculty in supporting client-
side replicas due to their ephemeral nature.
Third, we study how client misbehaviour can impact the guarantees of causal consistency.
We propose new secure weak consistency models for insecure settings, and algorithms to enforce
such consistency models.
The experimental evaluation of our contributions have shown their specic benets and
limitations compared with the state-of-the-art. In general, the cloud-edge hybrid model leads to
faster application response times, lower client-to-client latency, higher system scalability as fewer clients need to connect to servers at the same time, the possibility to work oine or disconnected
from the server, and reduced server bandwidth usage.
In summary, we propose a hybrid of cloud-and-edge which provides lower user-to-user la-
tency, availability under server disconnections, and improved server scalability – while being
ecient, reliable, and secure.Muitas aplicações modernas são criadas para fornecer interações entre utilizadores, incluindo
jogos multiutilizador, redes sociais e ferramentas colaborativas. Os utilizadores esperam que o
tempo de resposta nas aplicações seja da ordem de milissegundos, promovendo a interação e
interatividade.
A arquitetura dessas aplicações normalmente adota um modelo cliente-servidor, onde todas as
interações são mediadas por um componente centralizado. Essa abordagem apresenta problemas
de disponibilidade e tolerância a falhas, que podem ser mitigadas com replicação no componente
do servidor, até com a utilização de soluções replicadas geogracamente em infraestruturas de
computação na nuvem. Mesmo neste caso, o modelo de comunicação cliente-servidor leva a
penalidades de latência desnecessárias para clientes geogracamente próximos e altos custos
operacionais para o provedor das aplicações.
Esta dissertação propõe um modelo híbrido cloud-edge com mecanismos seguros e ecientes
de propagação e consistência. Esse modelo combina replicação do lado do cliente e propagação
de cliente para cliente para fornecer baixa latência e minimizar a dependência na infraestrutura
do servidor, promovendo a disponibilidade e tolerância a falhas. Para realizar este modelo, este
trabalho faz as seguintes contribuições principais.
Primeiro, o modelo híbrido cloud-edge é materializado por uma arquitetura do sistema em
que os clientes mantêm réplicas dos dados e sincronizam de maneira ponto a ponto e onde os
servidores são usados para auxiliar na operação dos clientes. Estudamos como trazer a maior
parte da lógica das aplicações para o lado do cliente, usando o serviço centralizado principalmente
para durabilidade, controlo de acesso, descoberta e superação das limitações inter-rede.
Em segundo lugar, denimos protocolos para replicação de dados fracamente consistentes,
incluindo um novo modelo de CRDTs (∆-CRDTs). Fornecemos um estudo sobre replicação parcial,
explorando os desaos e limitações fundamentais em fornecer consistência causal e a diculdade
em suportar réplicas do lado do cliente devido à sua natureza efémera.
Terceiro, estudamos como o mau comportamento da parte do cliente pode afetar as garantias
da consistência causal. Propomos novos modelos seguros de consistência fraca para congurações
inseguras e algoritmos para impor tais modelos de consistência.
A avaliação experimental das nossas contribuições mostrou os benefícios e limitações em comparação com o estado da arte. Em geral, o modelo híbrido cloud-edge leva a tempos de resposta
nas aplicações mais rápidos, a uma menor latência de cliente para cliente e à possibilidade de
trabalhar oine ou desconectado do servidor. Adicionalmente, obtemos uma maior escalabilidade
do sistema, visto que menos clientes precisam de estar conectados aos servidores ao mesmo tempo
e devido à redução na utilização da largura de banda no servidor.
Em resumo, propomos um modelo híbrido entre a orla (edge) e a nuvem (cloud) que fornece
menor latência entre utilizadores, disponibilidade durante desconexões do servidor e uma melhor
escalabilidade do servidor – ao mesmo tempo que é eciente, conável e seguro
Concurrency and Distribution in Reactive Programming
Distributed Reactive Programming is a paradigm for implementing distributed interactive applications modularly and declaratively. Applications are defined as dynamic distributed dataflow graphs of reactive computations that depend upon each other, similar to formulae in spreadsheets. A runtime library propagates input changes through the dataflow graph, recomputing the results of affected dependent computations while adapting the dataflow graph topology to changing dependency relations on the fly. Reactive Programming has been shown to improve code quality, program comprehension and maintainability over modular interactive application designs based on callbacks. Part of what makes Reactive Programming easy to use is its synchronous change propagation semantics: Changes are propagated such that no computation can ever observe an only partially updated state of the dataflow graph, i.e., each input change together with all dependent recomputations appears to be instantaneous.
Traditionally, in local single-threaded applications, synchronous semantics are achieved through glitch freedom consistency: a recomputation may be executed only after all its dependencies have been recomputed. In distributed applications though, this established approach breaks in two ways. First, glitch freedom implies synchronous semantics only if change propagations execute in isolation. Distributed applications are inherently exposed to concurrency in that multiple threads may propagate different changes concurrently. Ensuring isolation between concurrent change propagations requires the integration of additional algorithms for concurrency control. Second, applications’ dataflow graphs are spread across multiple hosts. Therefore, distributed reactive programming requires algorithms for both glitch freedom and concurrency control that are decentralized, i.e., function without access to shared memory. A comprehensive survey of related prior works shows, that none have managed to solve this issue so far.
This dissertation introduces FrameSweep, the first algorithm to provide synchronous propagation semantics for concurrent change propagation over dynamic distributed dataflow graphs. FrameSweep provides isolation and glitch freedom through a combination of fine-grained concurrency control algorithms for linearizability and serializability from databases with several aspects of change propagation algorithms from Reactive Programming and Automatic Incrementalization. The correctness of FrameSweep is formally proven through the application of multiversion concurrency control theory. FrameSweep decentralizes all its algorithmic components, and can therefore be integrated into any reactive programming library in a syntactically transparent manner: Applications can continue to use the same syntax for Reactive Programming as before, without requiring any adaptations of their code. FrameSweep therefore provides the exact same semantics as traditional local single-threaded Reactive Programming through the exact same syntax. As a result, FrameSweep proves by example that interactive applications can reap all benefits of Reactive Programming even if they are concurrent or distributed.
A comprehensive empirical evaluation measures through benchmarks, how FrameSweep’s performance and scalability are affected by a multitude of factors, e.g., thread contention, dataflow topology, dataflow topology changes, or distribution topology. It shows that FrameSweep’s performance compares favorably to alternative scheduling approaches with weaker guarantees in local applications. An existing Reactive Programming application is migrated to FrameSweep to empirically verify the claims of semantic and syntactic transparency
- …