251 research outputs found

    Well-Structured Futures and Cache Locality

    Full text link
    In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur Ω(PT∞+tT∞)\Omega(P T_\infty + t T_\infty) deviations, which implies Ω(CPT∞+CtT∞)\Omega(C P T_\infty + C t T_\infty) additional cache misses, where CC is the number of cache lines, PP is the number of processors, tt is the number of touches, and T∞T_\infty is the \emph{computation span}. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a thread to which the future has been passed from the thread that created it, then parallel executions with work stealing can incur at most O(CPT∞2)O(C P T^2_\infty) additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications

    Distributed Computability in Byzantine Asynchronous Systems

    Full text link
    In this work, we extend the topology-based approach for characterizing computability in asynchronous crash-failure distributed systems to asynchronous Byzantine systems. We give the first theorem with necessary and sufficient conditions to solve arbitrary tasks in asynchronous Byzantine systems where an adversary chooses faulty processes. In our adversarial formulation, outputs of non-faulty processes are constrained in terms of inputs of non-faulty processes only. For colorless tasks, an important subclass of distributed problems, the general result reduces to an elegant model that effectively captures the relation between the number of processes, the number of failures, as well as the topological structure of the task's simplicial complexes.Comment: Will appear at the Proceedings of the 46th Annual Symposium on the Theory of Computing, STOC 201

    Presentation and Publication: Loss and Slippage in Networks of Automated Market Makers

    Get PDF
    Automated market makers (AMMs) are smart contracts that automatically trade electronic assets according to a mathematical formula. This paper investigates how an AMM's formula affects the interests of liquidity providers, who endow the AMM with assets, and traders, who exchange one asset for another at the AMM's rates. *Linear slippage* measures how a trade's size affects the trader's return, *angular slippage* measures how a trade's size affects the subsequent market price, *divergence loss* measures the opportunity cost of providers' investments, and *load* balances the costs to traders and providers. We give formal definitions for these costs, show that they obey certain conservation laws: these costs can be shifted around but never fully eliminated. We analyze how these costs behave under *composition*, when simple individual AMMs are linked to form more complex networks of AMMs.Comment: To appear in the Proceedings of **Tokenomics 2021** (3rd International Conference on Blockchain Economics, Security and Protocols

    Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

    Full text link
    Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation
    • …
    corecore