454 research outputs found

    Revisiting Snapshot Algorithms by Refinement-based Techniques

    Get PDF
    International audienceThe snapshot problem addresses a collection of important algorithmic issues related to the distributed computations, which are used for debugging or recovering the distributed programs. Among the existing solutions, Chandy and Lamport propose a simple distributed algorithm. In this paper, we explore the correct-by-construction process to formalize the snapshot algorithms in distributed system. The formalization process is based on a modeling language Event B, which supports a refinement-based incremental development using RODIN platform. These refinement-based techniques help to derive a correct distributed algorithm. Moreover, we demonstrate how this class of other distributed algorithms can be revisited. A consequence is to provide a fully mechanized proof of the distributed algorithms

    Easier Parallel Programming with Provably-Efficient Runtime Schedulers

    Get PDF
    Over the past decade processor manufacturers have pivoted from increasing uniprocessor performance to multicore architectures. However, utilizing this computational power has proved challenging for software developers. Many concurrency platforms and languages have emerged to address parallel programming challenges, yet writing correct and performant parallel code retains a reputation of being one of the hardest tasks a programmer can undertake. This dissertation will study how runtime scheduling systems can be used to make parallel programming easier. We address the difficulty in writing parallel data structures, automatically finding shared memory bugs, and reproducing non-deterministic synchronization bugs. Each of the systems presented depends on a novel runtime system which provides strong theoretical performance guarantees and performs well in practice

    Efficient Online Processing for Advanced Analytics

    Get PDF
    With the advent of emerging technologies and the Internet of Things, the importance of online data analytics has become more pronounced. Businesses and companies are adopting approaches that provide responsive analytics to stay competitive in the global marketplace. Online analytics allow data analysts to promptly react to patterns or to gain preliminary insights from early results that aid in research, decision making, and effective strategy planning. The growth of data-velocity in a variety of domains including, high-frequency trading, social networks, infrastructure monitoring, and advertising require adopting online engines that can efficiently process continuous streams of data. This thesis presents foundations, techniques, and systems' design that extend the state-of-the-art in online query processing to efficiently support relational joins with arbitrary join-predicates (beyond traditional equi-joins); and to support other data models (beyond relational) that target machine learning and graph computations. The thesis is divided into two parts: We first present a brief overview of Squall, our open-source online query processing engine that supports SQL-like queries on top of streams. Then, we focus on extending Squall to support efficient theta-join processing. Scalable distributed join processing requires a partitioning policy that evenly distributes the processing load while minimizing the size of maintained state and duplicated messages. Efficient load-balance demands apriori-statistics which are not available in the online setting. We propose a novel operator that continuously adjusts itself to the data dynamics, through adaptive dataflow routing and state repartitioning. It is also resilient to data-skew, maintains high throughput rates, avoids blocking during state repartitioning, and behaves as a black-box dataflow operator with provable performance guarantees. Our evaluation demonstrates that the proposed operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time up to 7x. In the second part, we present a novel framework that supports the Incremental View Maintenance (IVM) of workloads expressed as linear algebra programs. Linear algebra represents a concrete substrate for advanced analytical tasks including, machine learning, scientific computation, and graph algorithms. Previous works on relational calculus IVM are not applicable to matrix algebra workloads. This is because a single entry change to an input-matrix results in changes all over the intermediate views, rendering IVM useless in comparison to re-evaluation. We present Lago, a unified modular compiler framework that supports the IVM of a broad class of linear algebra programs. Lago automatically derives and optimizes incremental trigger programs of analytical computations, while freeing the user from erroneous manual derivations, low-level implementation details, and performance tuning. We present a novel technique that captures Δ\Delta changes as low-rank matrices. Low-rank matrices are representable in a compressed factored form that enables cheaper computations. Lago automatically propagates the factored representation across program statements to derive an efficient trigger program. Moreover, Lago extends its support to other domains that use different semi-ring configurations, e.g., graph applications. Our evaluation results demonstrate orders of magnitude (10x-1

    SMA -- The Smyle Modeling Approach

    Get PDF
    This paper introduces the model-based software development lifecycle model SMA -- the Smyle Modeling Approach -- which is centered around Smyle. Smyle is a dedicated learning procedure to support engineers to interactively obtain design models from requirements, characterized as either being desired (positive) or unwanted (negative) system behavior. Within SMA, the learning approach is complemented by so-called scenario patterns where the engineer can specify clearly desired or unwanted behavior. This way, user interaction is reduced to the interesting scenarios limiting the design effort considerably. In SMA, the learning phase is further complemented by an effective analysis phase that allows for detecting design flaws at an early design stage. Using learning techniques allows us to gradually develop and refine requirements, naturally supporting evolving requirements, and allows for a rather inexpensive redesign in case anomalous system behavior is detected during analysis, testing, or maintenance. This paper describes the approach and reports on first practical experiences

    Improved authenticated data structures for blockchain synchronization

    Get PDF
    One of the most important components in a public blockchain like Bitcoin and Ethereum is the authenticated data structure that keeps track of all block data, transactions, and the world state (account balance, smart contract states, etc.) Thanks to authenticated data structures, lightweight nodes only need to store authentication information and can delegate queries to those nodes with a full replica of data and the authenticated data structure. The lightweight nodes can trust the query results after verifying against the authentication information. It is also critical to have enough nodes in the network that are equipped with the authenticated data structure to ensure scalability and availability, which is especially important for public blockchains. Therefore, every public blockchain highly encourages users to download the authenticated data structure as the first step. Fetching all elements from the entire authenticated data structure is a novel query type that has not gathered attention in the past. We describe this new emerging query type in the three-party authenticated data structure (ADS). We improve the design and implementation of the authenticated data structure so that the new query type is well-supported. We specifically apply the improvements to the Ethereum blockchain network. With our proposed ADS system in Ethereum, we improve Ethereum state synchronization performance by 216 times

    IELE: An Intermediate-Level Blockchain Language Designed and Implemented Using Formal Semantics

    Get PDF
    Most languages are given an informal semantics until they are implemented, so the formal semantics comes later. Consequently, there are usually inconsistencies among the informal semantics, the implementation, and the formal semantics. IELE is an LLVM-like language for the blockchain that was specified formally and its implementation, a virtual machine, generated from the formal specification. Moreover, its design was based on problems observed formalizing the semantics of the Ethereum Virtual Machine (EVM) and from formally specifying and verifying EVM programs (also called “smart contracts”), so even the design decisions made for IELE are based on formal specifications. A compiler from Solidity, the predominant high-level language for smart contracts, to IELE has also been implemented, so Ethereum contracts can now also be executed on IELE. The virtual machine automatically generated from the semantics of IELE is shown to be competitive in terms of performance with the state of the art and hence can stand as the de facto implementation of the language in a production setting. Indeed, IOHK, a major blockchain company, is currently experimenting with the IELE VM in order to deploy it as its computational layer in a few months. This makes IELE the first practical language that is designed and implemented as a formal specification. It took only 10 man-months to develop IELE, which demonstrates that the programming language semantics field has reached a level of maturity that makes it appealing over the traditional, adhoc approach even for pragmatic reasons.Ope

    Multi-Master Replication for Snapshot Isolation Databases

    Get PDF
    Lazy replication with snapshot isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication requires the execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily replicated system. We propose a set of techniques that support update transaction execution over multiple partitioned sites, thereby allowing the master to scale. Our techniques determine a total SI order for update transactions over multiple master sites without requiring global coordination in the distributed system, and ensure that updates are installed in this order at all sites to provide consistent and scalable replication with SI. We have built our techniques into PostgreSQL and demonstrate their effectiveness through experimental evaluation.1 yea

    Towards Efficient Lifelong Machine Learning in Deep Neural Networks

    Get PDF
    Humans continually learn and adapt to new knowledge and environments throughout their lifetimes. Rarely does learning new information cause humans to catastrophically forget previous knowledge. While deep neural networks (DNNs) now rival human performance on several supervised machine perception tasks, when updated on changing data distributions, they catastrophically forget previous knowledge. Enabling DNNs to learn new information over time opens the door for new applications such as self-driving cars that adapt to seasonal changes or smartphones that adapt to changing user preferences. In this dissertation, we propose new methods and experimental paradigms for efficiently training continual DNNs without forgetting. We then apply these methods to several visual and multi-modal perception tasks including image classification, visual question answering, analogical reasoning, and attribute and relationship prediction in visual scenes

    A heuristic-based approach to code-smell detection

    Get PDF
    Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache
    • …
    corecore