454 research outputs found
Revisiting Snapshot Algorithms by Refinement-based Techniques
International audienceThe snapshot problem addresses a collection of important algorithmic issues related to the distributed computations, which are used for debugging or recovering the distributed programs. Among the existing solutions, Chandy and Lamport propose a simple distributed algorithm. In this paper, we explore the correct-by-construction process to formalize the snapshot algorithms in distributed system. The formalization process is based on a modeling language Event B, which supports a refinement-based incremental development using RODIN platform. These refinement-based techniques help to derive a correct distributed algorithm. Moreover, we demonstrate how this class of other distributed algorithms can be revisited. A consequence is to provide a fully mechanized proof of the distributed algorithms
Easier Parallel Programming with Provably-Efficient Runtime Schedulers
Over the past decade processor manufacturers have pivoted from increasing uniprocessor performance to multicore architectures. However, utilizing this computational power has proved challenging for software developers. Many concurrency platforms and languages have emerged to address parallel programming challenges, yet writing correct and performant parallel code retains a reputation of being one of the hardest tasks a programmer can undertake.
This dissertation will study how runtime scheduling systems can be used to make parallel programming easier. We address the difficulty in writing parallel data structures, automatically finding shared memory bugs, and reproducing non-deterministic synchronization bugs. Each of the systems presented depends on a novel runtime system which provides strong theoretical performance guarantees and performs well in practice
Efficient Online Processing for Advanced Analytics
With the advent of emerging technologies and the Internet of Things, the importance of online data analytics has become more pronounced. Businesses and companies are adopting approaches that provide responsive analytics to stay competitive in the global marketplace. Online analytics allow data analysts to promptly react to patterns or to gain preliminary insights from early results that aid in research, decision making, and effective strategy planning. The growth of data-velocity in a variety of domains including, high-frequency trading, social networks, infrastructure monitoring, and advertising require adopting online engines that can efficiently process continuous streams of data. This thesis presents foundations, techniques, and systems' design that extend the state-of-the-art in online query processing to efficiently support relational joins with arbitrary join-predicates (beyond traditional equi-joins); and to support other data models (beyond relational) that target machine learning and graph computations. The thesis is divided into two parts: We first present a brief overview of Squall, our open-source online query processing engine that supports SQL-like queries on top of streams. Then, we focus on extending Squall to support efficient theta-join processing. Scalable distributed join processing requires a partitioning policy that evenly distributes the processing load while minimizing the size of maintained state and duplicated messages. Efficient load-balance demands apriori-statistics which are not available in the online setting. We propose a novel operator that continuously adjusts itself to the data dynamics, through adaptive dataflow routing and state repartitioning. It is also resilient to data-skew, maintains high throughput rates, avoids blocking during state repartitioning, and behaves as a black-box dataflow operator with provable performance guarantees. Our evaluation demonstrates that the proposed operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time up to 7x. In the second part, we present a novel framework that supports the Incremental View Maintenance (IVM) of workloads expressed as linear algebra programs. Linear algebra represents a concrete substrate for advanced analytical tasks including, machine learning, scientific computation, and graph algorithms. Previous works on relational calculus IVM are not applicable to matrix algebra workloads. This is because a single entry change to an input-matrix results in changes all over the intermediate views, rendering IVM useless in comparison to re-evaluation. We present Lago, a unified modular compiler framework that supports the IVM of a broad class of linear algebra programs. Lago automatically derives and optimizes incremental trigger programs of analytical computations, while freeing the user from erroneous manual derivations, low-level implementation details, and performance tuning. We present a novel technique that captures changes as low-rank matrices. Low-rank matrices are representable in a compressed factored form that enables cheaper computations. Lago automatically propagates the factored representation across program statements to derive an efficient trigger program. Moreover, Lago extends its support to other domains that use different semi-ring configurations, e.g., graph applications. Our evaluation results demonstrate orders of magnitude (10x-1
SMA -- The Smyle Modeling Approach
This paper introduces the model-based software development lifecycle model SMA -- the Smyle Modeling Approach -- which is centered around Smyle. Smyle is a dedicated learning procedure to support engineers to interactively obtain design models from requirements, characterized as either being desired (positive) or unwanted (negative) system behavior. Within SMA, the learning approach is complemented by so-called scenario patterns where the engineer can specify clearly desired or unwanted behavior. This way, user interaction is reduced to the interesting scenarios limiting the design effort considerably. In SMA, the learning phase is further complemented by an effective analysis phase that allows for detecting design flaws at an early design stage. Using learning techniques allows us to gradually develop and refine requirements, naturally supporting evolving requirements, and allows for a rather inexpensive redesign in case anomalous system behavior is detected during analysis, testing, or maintenance. This paper describes the approach and reports on first practical experiences
Improved authenticated data structures for blockchain synchronization
One of the most important components in a public blockchain like Bitcoin and Ethereum is the authenticated data structure that keeps track of all block data, transactions, and the world state (account balance, smart contract states, etc.) Thanks to authenticated data structures, lightweight nodes only need to store authentication information and can delegate queries to those nodes with a full replica of data and the authenticated data structure. The lightweight nodes can trust the query results after verifying against the authentication information. It is also critical to have enough nodes in the network that are equipped with the authenticated data structure to ensure scalability and availability, which is especially important for public blockchains. Therefore, every public blockchain highly encourages users to download the authenticated data structure as the first step.
Fetching all elements from the entire authenticated data structure is a novel query type that has not gathered attention in the past. We describe this new emerging query type in the three-party authenticated data structure (ADS). We improve the design and implementation of the authenticated data structure so that the new query type is well-supported. We specifically apply the improvements to the Ethereum blockchain network. With our proposed ADS system in Ethereum, we improve Ethereum state synchronization performance by 216 times
IELE: An Intermediate-Level Blockchain Language Designed and Implemented Using Formal Semantics
Most languages are given an informal semantics until they are implemented, so the formal semantics comes later. Consequently, there are usually inconsistencies among the informal semantics, the implementation, and the formal semantics. IELE is an LLVM-like language for the blockchain that was specified formally and its implementation, a virtual machine, generated from the formal specification. Moreover, its design was based on problems observed formalizing the semantics of the Ethereum Virtual Machine (EVM) and from formally specifying and verifying EVM programs (also called “smart contracts”), so even the design decisions made for IELE are based on formal specifications. A compiler from Solidity, the predominant high-level language for smart contracts, to IELE has also been implemented, so Ethereum contracts can now also be executed on IELE. The virtual machine automatically generated from the semantics of IELE is shown to be competitive in terms of performance with the state of the art and hence can stand as the de facto implementation of the language in a production setting. Indeed, IOHK, a major blockchain company, is currently experimenting with the IELE VM in order to deploy it as its computational layer in a few months. This makes IELE the first practical language that is designed and implemented as a formal specification. It took only 10 man-months to develop IELE, which demonstrates that the programming language semantics field has reached a level of maturity that makes it appealing over the traditional, adhoc approach even for pragmatic reasons.Ope
Multi-Master Replication for Snapshot Isolation Databases
Lazy replication with snapshot isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication requires the execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily replicated system. We propose a set of techniques that support update transaction execution over multiple partitioned sites, thereby allowing the master to scale. Our techniques determine a total SI order for update transactions over multiple master sites without requiring global coordination in the distributed system, and ensure that updates are installed in this order at all sites to provide consistent and scalable replication with SI. We have built our techniques into PostgreSQL and demonstrate their effectiveness through experimental evaluation.1 yea
Recommended from our members
Transformational maintenance by reuse of design histories
This thesis provides theory and procedures for modifying software artifacts implemented by a formal transformation process. Installing modifications requires knowing not only what transformations were applied (a derivation history) to construct the artifact, but also why the application sequence ensures that the artifact meets its specification. The derivation history and the justification are collectively called a design history. A Design Maintenance System (DMS), when provided with a formal change called a maintenance delta, revises a design history to guide construction of a new artifact. A DMS can be used to integrate a stream of deltas into a history, providing implementations as a side effect, leading to an incremental-evolution model for software construction.We provide a broadly applicable formal model of transformation systems in which specifications are performance predicates, subsuming the functional specifications which are traditional for transformation systems. Such performance predicates provide vocabulary used in the design history to describe the effect of applying sets of transformations.A nonprocedural, performance-goal-oriented Transformation Control Language (TCL) is defined to control navigation of the design space for a transformation system. Recording the execution of a TCL metaprogram directly provides a design history.A complete classification of, and representation for, the set of possible maintenance deltas is given in terms of the inputs defined by the transformation system model. Such deltas include not only specification changes, but also changes to implementation support technologies. Delta integration procedures for revising derivation histories given functional or support technology deltas are provided, based on rearranging the order of transformations in the design space. Building on these operations, integration procedures that revise the design history for each type of delta are described. An agenda-oriented TCL execution process dovetails smoothly with the integration procedures.Our DMS is compared to a number of other maintenance systems. By using an explicit delta and verified commutativity, our DMS often reuses transformations correctly when others fail
Towards Efficient Lifelong Machine Learning in Deep Neural Networks
Humans continually learn and adapt to new knowledge and environments throughout their lifetimes. Rarely does learning new information cause humans to catastrophically forget previous knowledge. While deep neural networks (DNNs) now rival human performance on several supervised machine perception tasks, when updated on changing data distributions, they catastrophically forget previous knowledge. Enabling DNNs to learn new information over time opens the door for new applications such as self-driving cars that adapt to seasonal changes or smartphones that adapt to changing user preferences. In this dissertation, we propose new methods and experimental paradigms for efficiently training continual DNNs without forgetting. We then apply these methods to several visual and multi-modal perception tasks including image classification, visual question answering, analogical reasoning, and attribute and relationship prediction in visual scenes
A heuristic-based approach to code-smell detection
Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache
- …