1,378 research outputs found
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
Understanding Optimization Phase Interactions to Reduce the Phase Order Search Space
Compiler optimization phase ordering is a longstanding problem, and is of particular relevance to the performance-oriented and cost-constrained domain of embedded systems applications. Optimization phases are known to interact with each other, enabling and disabling opportunities for successive phases. Therefore, varying the order of applying these phases often generates distinct output codes, with different speed, code-size and power consumption characteristics. Most cur- rent approaches to address this issue focus on developing innovative methods to selectively evaluate the vast phase order search space to produce a good (but, potentially suboptimal) representation for each program. In contrast, the goal of this thesis is to study and reduce the phase order search space by: (1) identifying common causes of optimization phase interactions across all phases, and then devising techniques to eliminate them, and (2) exploiting natural phase independence to prune the phase order search space. We observe that several phase interactions are caused by false register dependence during many optimization phases. We explore the potential of cleanup phases, such as register remapping and copy propagation, at reducing false dependences. We show that innovative implementation and application of these phases not only reduces the size of the phase order search space substantially, but can also improve the quality of code generated by optimizing compilers. We examine the effect of removing cleanup phases, such as dead assignment elimination, which should not interact with other compiler phases, from the phase order search space. Finally, we show that reorganization of the phase order search into a multi-staged approach employing sets of mutually independent optimizations can reduce the search space to a fraction of its original size without sacrificing performance
Formal Compiler Implementation in a Logical Framework
The task of designing and implementing a compiler can be a difficult and error-prone process. In this paper, we present a new approach based on the use of higher-order abstract syntax and term rewriting in a logical framework. All program transformations, from parsing to code generation, are cleanly isolated and specified as term rewrites. This has several advantages. The correctness of the compiler depends solely on a small set of rewrite rules that are written in the language of formal mathematics. In addition, the logical framework guarantees the preservation of scoping, and it automates many frequently-occurring tasks including substitution and rewriting strategies. As we show, compiler development in a logical framework can be easier than in a general-purpose language like ML, in part because of automation, and also because the framework provides extensive support for examination, validation, and debugging of the compiler transformations. The paper is organized around a case study, using the MetaPRL logical framework to compile an ML-like language to Intel x86 assembly. We also present a scoped formalization of x86 assembly in which all registers are immutable
Data optimizations for constraint automata
Constraint automata (CA) constitute a coordination model based on finite
automata on infinite words. Originally introduced for modeling of coordinators,
an interesting new application of CAs is implementing coordinators (i.e.,
compiling CAs into executable code). Such an approach guarantees
correctness-by-construction and can even yield code that outperforms
hand-crafted code. The extent to which these two potential advantages
materialize depends on the smartness of CA-compilers and the existence of
proofs of their correctness.
Every transition in a CA is labeled by a "data constraint" that specifies an
atomic data-flow between coordinated processes as a first-order formula. At
run-time, compiler-generated code must handle data constraints as efficiently
as possible. In this paper, we present, and prove the correctness of two
optimization techniques for CA-compilers related to handling of data
constraints: a reduction to eliminate redundant variables and a translation
from (declarative) data constraints to (imperative) data commands expressed in
a small sequential language. Through experiments, we show that these
optimization techniques can have a positive impact on performance of generated
executable code
C์ ์ ์์ค ๊ธฐ๋ฅ๊ณผ ์ปดํ์ผ๋ฌ ์ต์ ํ ์กฐํ์ํค๊ธฐ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2019. 2. ํ์ถฉ๊ธธ.์ฃผ๋ฅ C ์ปดํ์ผ๋ฌ๋ค์ ํ๋ก๊ทธ๋จ์ ์ฑ๋ฅ์ ๋์ด๊ธฐ ์ํด ๊ณต๊ฒฉ์ ์ธ ์ต์ ํ๋ฅผ ์ํํ๋๋ฐ, ๊ทธ๋ฐ ์ต์ ํ๋ ์ ์์ค ๊ธฐ๋ฅ์ ์ฌ์ฉํ๋ ํ๋ก๊ทธ๋จ์ ํ๋์ ๋ฐ๊พธ๊ธฐ๋ ํ๋ค. ๋ถํํ๋ C ์ธ์ด๋ฅผ ๋์์ธํ ๋ ์ ์์ค ๊ธฐ๋ฅ๊ณผ ์ปดํ์ผ๋ฌ ์ต์ ํ๋ฅผ ์ ์ ํ๊ฒ ์กฐํ์ํค๊ฐ ๊ต์ฅํ ์ด๋ ต๋ค๋ ๊ฒ์ด ํ๊ณ์ ์
๊ณ์ ์ค๋ก ์ด๋ค. ์ ์์ค ๊ธฐ๋ฅ์ ์ํด์๋, ๊ทธ๋ฌํ ๊ธฐ๋ฅ์ด ์์คํ
ํ๋ก๊ทธ๋๋ฐ์ ์ฌ์ฉ๋๋ ํจํด์ ์ ์ง์ํด์ผ ํ๋ค. ์ปดํ์ผ๋ฌ ์ต์ ํ๋ฅผ ์ํด์๋, ์ฃผ๋ฅ ์ปดํ์ผ๋ฌ๊ฐ ์ํํ๋ ๋ณต์กํ๊ณ ๋ ํจ๊ณผ์ ์ธ ์ต์ ํ๋ฅผ ์ ์ง์ํด์ผ ํ๋ค. ๊ทธ๋ฌ๋ ์ ์์ค ๊ธฐ๋ฅ๊ณผ ์ปดํ์ผ๋ฌ ์ต์ ํ๋ฅผ ๋์์ ์ ์ง์ํ๋ ์คํ์๋ฏธ๋ ์ค๋๋ ๊น์ง ์ ์๋ ๋ฐ๊ฐ ์๋ค.
๋ณธ ๋ฐ์ฌํ์ ๋
ผ๋ฌธ์ ์์คํ
ํ๋ก๊ทธ๋๋ฐ์์ ์๊ธดํ๊ฒ ์ฌ์ฉ๋๋ ์ ์์ค ๊ธฐ๋ฅ๊ณผ ์ฃผ์ํ ์ปดํ์ผ๋ฌ ์ต์ ํ๋ฅผ ์กฐํ์ํจ๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ์ฐ๋ฆฐ ๋ค์ ์ฑ์ง์ ๋ง์กฑํ๋ ๋์จํ ๋์์ฑ, ๋ถํ ์ปดํ์ผ, ์ ์-ํฌ์ธํฐ ๋ณํ์ ์คํ์๋ฏธ๋ฅผ ์ฒ์์ผ๋ก ์ ์ํ๋ค. ์ฒซ์งธ, ๊ธฐ๋ฅ์ด ์์คํ
ํ๋ก๊ทธ๋๋ฐ์์ ์ฌ์ฉ๋๋ ํจํด๊ณผ, ๊ทธ๋ฌํ ํจํด์ ๋
ผ์ฆํ ์ ์๋ ๊ธฐ๋ฒ์ ์ง์ํ๋ค. ๋์งธ, ์ฃผ์ํ ์ปดํ์ผ๋ฌ ์ต์ ํ๋ค์ ์ง์ํ๋ค. ์ฐ๋ฆฌ๊ฐ ์ ์ํ ์คํ์๋ฏธ์ ์์ ๊ฐ์ ์ป๊ธฐ ์ํด ์ฐ๋ฆฌ๋ ๋
ผ๋ฌธ์ ์ฃผ์ ๊ฒฐ๊ณผ๋ฅผ ๋๋ถ๋ถ Coq ์ฆ๋ช
๊ธฐ ์์์ ์ฆ๋ช
ํ๊ณ , ๊ทธ ์ฆ๋ช
์ ๊ธฐ๊ณ์ ์ด๊ณ ์๋ฐํ๊ฒ ํ์ธํ๋ค.To improve the performance of C programs, mainstream compilers perform aggressive optimizations that may change the behaviors of programs that use low-level features in unidiomatic ways. Unfortunately, despite many years of research and industrial efforts, it has proven very difficult to adequately balance the conflicting criteria for low-level features and compiler optimizations in the design of the C programming language. On the one hand, C should support the common usage patterns of the low-level features in systems programming. On the other hand, C should also support the sophisticated and yet effective optimizations performed by mainstream compilers. None of the existing proposals for C semantics, however, sufficiently support low-level features and compiler optimizations at the same time.
In this dissertation, we resolve the conflict between some of the low-level features crucially used in systems programming and major compiler optimizations. Specifically, we develop the first formal semantics of relaxed-memory concurrency, separate compilation, and cast between integers and pointers that (1) supports their common usage patterns and reasoning principles for programmers, and (2) provably validates major compiler optimizations at the same time. To establish confidence in our formal semantics, we have formalized most of our key results in the Coq theorem prover, which automatically and rigorously checks the validity of the results.Abstract
Acknowledgements
Chapter I Prologue
Chapter II Relaxed-Memory Concurrency
Chapter III Separate Compilation and Linking
Chapter IV Cast between Integers and Pointers
Chapter V Epilogue
์ด๋กDocto
SmartTrack: Efficient Predictive Race Detection
Widely used data race detectors, including the state-of-the-art FastTrack
algorithm, incur performance costs that are acceptable for regular in-house
testing, but miss races detectable from the analyzed execution. Predictive
analyses detect more data races in an analyzed execution than FastTrack
detects, but at significantly higher performance cost.
This paper presents SmartTrack, an algorithm that optimizes predictive race
detection analyses, including two analyses from prior work and a new analysis
introduced in this paper. SmartTrack's algorithm incorporates two main
optimizations: (1) epoch and ownership optimizations from prior work, applied
to predictive analysis for the first time; and (2) novel conflicting critical
section optimizations introduced by this paper. Our evaluation shows that
SmartTrack achieves performance competitive with FastTrack-a qualitative
improvement in the state of the art for data race detection.Comment: Extended arXiv version of PLDI 2020 paper (adds Appendices A-E) #228
SmartTrack: Efficient Predictive Race Detectio
- โฆ