54,077 research outputs found
Representing numeric data in 32 bits while preserving 64-bit precision
Data files often consist of numbers having only a few significant decimal
digits, whose information content would allow storage in only 32 bits. However,
we may require that arithmetic operations involving these numbers be done with
64-bit floating-point precision, which precludes simply representing the data
as 32-bit floating-point values. Decimal floating point gives a compact and
exact representation, but requires conversion with a slow division operation
before it can be used. Here, I show that interesting subsets of 64-bit
floating-point values can be compactly and exactly represented by the 32 bits
consisting of the sign, exponent, and high-order part of the mantissa, with the
lower-order 32 bits of the mantissa filled in by table lookup, indexed by bits
from the part of the mantissa retained, and possibly from the exponent. For
example, decimal data with 4 or fewer digits to the left of the decimal point
and 2 or fewer digits to the right of the decimal point can be represented in
this way using the lower-order 5 bits of the retained part of the mantissa as
the index. Data consisting of 6 decimal digits with the decimal point in any of
the 7 positions before or after one of the digits can also be represented this
way, and decoded using 19 bits from the mantissa and exponent as the index.
Encoding with such a scheme is a simple copy of half the 64-bit value, followed
if necessary by verification that the value can be represented, by checking
that it decodes correctly. Decoding requires only extraction of index bits and
a table lookup. Lookup in a small table will usually reference cache; even with
larger tables, decoding is still faster than conversion from decimal floating
point with a division operation. I discuss how such schemes perform on recent
computer systems, and how they might be used to automatically compress large
arrays in interpretive languages such as R
Program representation size in an intermediate language with intersection and union types
The CIL compiler for core Standard ML compiles whole programs using a novel typed intermediate language (TIL) with intersection and union types and flow labels on both terms and types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce customized data representations. However, duplication incurs compile-time space costs that are potentially much greater than are incurred in TILs employing type-level abstraction or quantification. In this paper, we present empirical data on the compile-time space costs of using CIL as an intermediate language. The data shows that these costs can be made tractable by using sufficiently fine-grained flow analyses together with standard hash-consing techniques. The data also suggests that non-duplicating formulations of intersection (and union) types would not achieve significantly better space complexity.National Science Foundation (CCR-9417382, CISE/CCR ESS 9806747); Sun grant (EDUD-7826-990410-US); Faculty Fellowship of the Carroll School of Management, Boston College; U.K. Engineering and Physical Sciences Research Council (GR/L 36963, GR/L 15685
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Relational databases have limited support for data collaboration, where teams
collaboratively curate and analyze large datasets. Inspired by software version
control systems like git, we propose (a) a dataset version control system,
giving users the ability to create, branch, merge, difference and search large,
divergent collections of datasets, and (b) a platform, DataHub, that gives
users the ability to perform collaborative data analysis building on this
version control system. We outline the challenges in providing dataset version
control at scale.Comment: 7 page
Interprocedural Type Specialization of JavaScript Programs Without Type Analysis
Dynamically typed programming languages such as Python and JavaScript defer
type checking to run time. VM implementations can improve performance by
eliminating redundant dynamic type checks. However, type inference analyses are
often costly and involve tradeoffs between compilation time and resulting
precision. This has lead to the creation of increasingly complex multi-tiered
VM architectures.
Lazy basic block versioning is a simple JIT compilation technique which
effectively removes redundant type checks from critical code paths. This novel
approach lazily generates type-specialized versions of basic blocks on-the-fly
while propagating context-dependent type information. This approach does not
require the use of costly program analyses, is not restricted by the precision
limitations of traditional type analyses.
This paper extends lazy basic block versioning to propagate type information
interprocedurally, across function call boundaries. Our implementation in a
JavaScript JIT compiler shows that across 26 benchmarks, interprocedural basic
block versioning eliminates more type tag tests on average than what is
achievable with static type analysis without resorting to code transformations.
On average, 94.3% of type tag tests are eliminated, yielding speedups of up to
56%. We also show that our implementation is able to outperform Truffle/JS on
several benchmarks, both in terms of execution time and compilation time.Comment: 10 pages, 10 figures, submitted to CGO 201
Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs
Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector executor transformation results in excessive instrumentation that hinders performance.; This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.; A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 x their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 x for applications with irregular accesses.Peer ReviewedPostprint (author's final draft
- …