9 research outputs found
ShareJIT: JIT Code Cache Sharing across Processes and Its Practical Implementation
Just-in-time (JIT) compilation coupled with code caching are widely used to
improve performance in dynamic programming language implementations. These code
caches, along with the associated profiling data for the hot code, however,
consume significant amounts of memory. Furthermore, they incur extra JIT
compilation time for their creation. On Android, the current standard JIT
compiler and its code caches are not shared among processes---that is, the
runtime system maintains a private code cache, and its associated data, for
each runtime process. However, applications running on the same platform tend
to share multiple libraries in common. Sharing cached code across multiple
applications and multiple processes can lead to a reduction in memory use. It
can directly reduce compile time. It can also reduce the cumulative amount of
time spent interpreting code. All three of these effects can improve actual
runtime performance.
In this paper, we describe ShareJIT, a global code cache for JITs that can
share code across multiple applications and multiple processes. We implemented
ShareJIT in the context of the Android Runtime (ART), a widely used,
state-of-the-art system. To increase sharing, our implementation constrains the
amount of context that the JIT compiler can use to optimize the code. This
exposes a fundamental tradeoff: increased specialization to a single process'
context decreases the extent to which the compiled code can be shared. In
ShareJIT, we limit some optimization to increase shareability. To evaluate the
ShareJIT, we tested 8 popular Android apps in a total of 30 experiments.
ShareJIT improved overall performance by 9% on average, while decreasing memory
consumption by 16% on average and JIT compilation time by 37% on average.Comment: OOPSLA 201
Coverage-Based Debloating for Java Bytecode
Software bloat is code that is packaged in an application but is actually not
necessary to run the application. The presence of software bloat is an issue
for security, for performance, and for maintenance. In this paper, we introduce
a novel technique for debloating Java bytecode, which we call coverage-based
debloating. We leverage a combination of state-of-the-art Java bytecode
coverage tools to precisely capture what parts of a project and its
dependencies are used at runtime. Then, we automatically remove the parts that
are not covered to generate a debloated version of the compiled project. We
successfully generate debloated versions of 220 open-source Java libraries,
which are syntactically correct and preserve their original behavior according
to the workload. Our results indicate that 68.3% of the libraries' bytecode and
20.5% of their total dependencies can be removed through coverage-based
debloating. Meanwhile, we present the first experiment that assesses the
utility of debloated libraries with respect to client applications that reuse
them. We show that 80.9% of the clients with at least one test that uses the
library successfully compile and pass their test suite when the original
library is replaced by its debloated version
An Abstract Interpretation-based Model of Tracing Just-In-Time Compilation
Tracing just-in-time compilation is a popular compilation technique for the
efficient implementation of dynamic languages, which is commonly used for
JavaScript, Python and PHP. We provide a formal model of tracing JIT
compilation of programs using abstract interpretation. Hot path detection
corresponds to an abstraction of the trace semantics of the program. The
optimization phase corresponds to a transform of the original program that
preserves its trace semantics up to an observation modeled by some abstraction.
We provide a generic framework to express dynamic optimizations and prove them
correct. We instantiate it to prove the correctness of dynamic type
specialization and constant variable folding. We show that our framework is
more general than the model of tracing compilation introduced by Guo and
Palsberg [2011] based on operational bisimulations.Comment: To appear in ACM Transactions on Programming Languages and System
Compilação Just-In-Time: Histórico, Arquitetura, Princípios e Sistemas
Diversas implementações de linguagens de alto nível focam no desenvolvimento de sistemas baseados em mecanismos de compilação just-in-time. Esse mecanismo possui o atrativo de melhorar o desempenho de tais linguagens, mantendo a portabilidade. Contudo, ao preço da inclusão do tempo de compilação ao tempo total de execução. Diante disso, as pesquisas na área têm voltado balancear o custo de compilação com eficiência de execução. Os primeiros sistemas de compilação just-in-time empregavam estratégias estáticas para selecionar e otimizar as regiões de código propícias para gerar bom desempenho. Sistemas mais sofisticados aprimoraram tais estratégias com o objetivo de aplicar otimizações de forma mais criteriosa. Nesse sentido, este tutorial apresenta os princípios que fundamentam a compilação just-in-time e sua evolução ao longo dos anos, bem como a abordagem utilizada por diversos sistemas para garantir o balanceamento de custo e eficiência. Embora seja difícil definir a melhor abordagem, trabalhos recentes mostram que estratégias rígidas para detecção e otimização de código, juntamente com recursos de paralelismo oferecidos pelas arquiteturas multi-core formarão a base dos futuros sistemas de compilação just-in-time
OpenISA, um conjunto de instruções híbrido
Orientador: Edson BorinTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: OpenISA é concebido como a interface de processadores que pretendem ser altamente flexíveis. Isto é conseguido por meio de três estratégias: em primeiro lugar, o ISA é empiricamente escolhido para ser facilmente traduzido para outros, possibilitando flexibilidade do software no caso de um processador OpenISA físico não estar disponível. Neste caso, não há nenhuma necessidade de aplicar um processador virtual OpenISA em software. O ISA está preparado para ser estaticamente traduzido para outros ISAs. Segundo, o ISA não é um ISA concreto nem um ISA virtual, mas um híbrido com a capacidade de admitir modificações nos opcodes sem afetar a compatibilidade retroativa. Este mecanismo permite que as futuras versões do ISA possam sofrer modificações em vez de extensões simples das versões anteriores, um problema comum com ISA concretos, como o x86. Em terceiro lugar, a utilização de uma licença permissiva permite o ISA ser usado livremente por qualquer parte interessada no projeto. Nesta tese de doutorado, concentramo-nos nas instruções de nível de usuário do OpenISA. A tese discute (1) alternativas para ISAs, alternativas para distribuição de programas e o impacto de cada opção, (2) características importantes de OpenISA para atingir seus objetivos e (3) fornece uma completa avaliação do ISA escolhido com respeito a emulação de desempenho em duas CPUs populares, uma projetada pela Intel e outra pela ARM. Concluímos que a versão do OpenISA apresentada aqui pode preservar desempenho próximo do nativo quando traduzida para outros hospedeiros, funcionando como um modelo promissor para ISAs flexíveis da próxima geração que podem ser facilmente estendidos preservando a compatibilidade. Ainda, também mostramos como isso pode ser usado como um formato de distribuição de programas no nível de usuárioAbstract: OpenISA is designed as the interface of processors that aim to be highly flexible. This is achieved by means of three strategies: first, the ISA is empirically chosen to be easily translated to others, providing software flexibility in case a physical OpenISA processor is not available. Second, the ISA is not a concrete ISA nor a virtual ISA, but a hybrid one with the capability of admitting modifications to opcodes without impacting backwards compatibility. This mechanism allows future versions of the ISA to have real changes instead of simple extensions of previous versions, a common problem with concrete ISAs such as the x86. Third, the use of a permissive license allows the ISA to be freely used by any party interested in the project. In this PhD. thesis, we focus on the user-level instructions of OpenISA. The thesis discusses (1) ISA alternatives, program distribution alternatives and the impact of each choice, (2) important features of OpenISA to achieve its goals and (3) provides a thorough evaluation of the chosen ISA with respect to emulation performance on two popular host CPUs, one from Intel and another from ARM. We conclude that the version of OpenISA presented here can preserve close-to-native performance when translated to other hosts, working as a promising model for next-generation, flexible ISAs that can be easily extended while preserving backwards compatibility. Furthermore, we show how this can also be a program distribution format at user-levelDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2011/09630-1FAPES
Low-Level Haskell Code: Measurements and Optimization Techniques
Haskell is a lazy functional language with a strong static type system and
excellent support for parallel programming. The language features of Haskell
make it easier to write correct and maintainable programs, but execution speed
often suffers from the high levels of abstraction. While much past research
focuses on high-level optimizations that take advantage of the functional
properties of Haskell, relatively little attention has been paid to the
optimization opportunities in the low-level imperative code generated during
translation to machine code. One problem with current low-level optimizations
is that their effectiveness is limited by the obscured control flow caused by
Haskell's high-level abstractions. My thesis is that trace-based optimization
techniques can be used to improve the effectiveness of low-level optimizations
for Haskell programs. I claim three unique contributions in this work.
The first contribution is to expose some properties of low-level Haskell codes
by looking at the mix of operations performed by the selected benchmark codes
and comparing them to the low-level codes coming from traditional programming
languages. The low-level measurements reveal that the control flow is obscured
by indirect jumps caused by the implementation of lazy evaluation,
higher-order functions, and the separately managed stacks used by Haskell
programs.
My second contribution is a study on the effectiveness of a dynamic binary
trace-based optimizer running on Haskell programs. My results show that while
viable program traces frequently occur in Haskell programs the overhead
associated with maintaing the traces in a dynamic optimization system outweigh
the benefits we get from running the traces. To reduce the runtime overheads,
I explore a way to find traces in a separate profiling step.
My final contribution is to build and evaluate a static trace-based optimizer
for Haskell programs. The static optimizer uses profiling data to find traces
in a Haskell program and then restructures the code around the traces to
increase the scope available to the low-level optimizer. My results show that
we can successfully build traces in Haskell programs, and the optimized code
yields a speedup over existing low-level optimizers of up to 86%
with an average speedup of 5% across 32 benchmarks