49,095 research outputs found
A Formalization of SQL with Nulls
SQL is the world's most popular declarative language, forming the basis of
the multi-billion-dollar database industry. Although SQL has been standardized,
the full standard is based on ambiguous natural language rather than formal
specification. Commercial SQL implementations interpret the standard in
different ways, so that, given the same input data, the same query can yield
different results depending on the SQL system it is run on. Even for a
particular system, mechanically checked formalization of all widely-used
features of SQL remains an open problem. The lack of a well-understood formal
semantics makes it very difficult to validate the soundness of database
implementations.
Although formal semantics for fragments of SQL were designed in the past,
they usually did not support set and bag operations, nested subqueries, and,
crucially, null values. Null values complicate SQL's semantics in profound ways
analogous to null pointers or side-effects in other programming languages.
Since certain SQL queries are equivalent in the absence of null values, but
produce different results when applied to tables containing incomplete data,
semantics which ignore null values are able to prove query equivalences that
are unsound in realistic databases.
A formal semantics of SQL supporting all the aforementioned features was only
proposed recently. In this paper, we report about our mechanization of SQL
semantics covering set/bag operations, nested subqueries, and nulls, written
the Coq proof assistant, and describe the validation of key metatheoretic
properties
Heap Reference Analysis Using Access Graphs
Despite significant progress in the theory and practice of program analysis,
analysing properties of heap data has not reached the same level of maturity as
the analysis of static and stack data. The spatial and temporal structure of
stack and static data is well understood while that of heap data seems
arbitrary and is unbounded. We devise bounded representations which summarize
properties of the heap data. This summarization is based on the structure of
the program which manipulates the heap. The resulting summary representations
are certain kinds of graphs called access graphs. The boundedness of these
representations and the monotonicity of the operations to manipulate them make
it possible to compute them through data flow analysis.
An important application which benefits from heap reference analysis is
garbage collection, where currently liveness is conservatively approximated by
reachability from program variables. As a consequence, current garbage
collectors leave a lot of garbage uncollected, a fact which has been confirmed
by several empirical studies. We propose the first ever end-to-end static
analysis to distinguish live objects from reachable objects. We use this
information to make dead objects unreachable by modifying the program. This
application is interesting because it requires discovering data flow
information representing complex semantics. In particular, we discover four
properties of heap data: liveness, aliasing, availability, and anticipability.
Together, they cover all combinations of directions of analysis (i.e. forward
and backward) and confluence of information (i.e. union and intersection). Our
analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates
referees' comment
Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
Exaggeration or context changes can render maintainability experience into
prejudice. For example, JavaScript is often seen as least elegant language and
hence of lowest maintainability. Such prejudice should not guide decisions
without prior empirical validation. We formulated 10 hypotheses about
maintainability based on prejudices and test them in a large set of open-source
projects (6,897 GitHub repositories, 402 million lines, 5 programming
languages). We operationalize maintainability with five static analysis
metrics. We found that JavaScript code is not worse than other code, Java code
shows higher maintainability than C# code and C code has longer methods than
other code. The quality of interface documentation is better in Java code than
in other code. Code developed by teams is not of higher and large code bases
not of lower maintainability. Projects with high maintainability are not more
popular or more often forked. Overall, most hypotheses are not supported by
open-source data.Comment: 20 page
Gradual Program Analysis
Dataflow analysis and gradual typing are both well-studied methods to gain information about computer programs in a finite amount of time. The gradual program analysis project seeks to combine those two techniques in order to gain the benefits of both. This thesis explores the background information necessary to understand gradual program analysis, and then briefly discusses the research itself, with reference to publication of work done so far. The background topics include essential aspects of programming language theory, such as syntax, semantics, and static typing; dataflow analysis concepts, such as abstract interpretation, semilattices, and fixpoint computations; and gradual typing theory, such as the concept of an unknown type, liftings of predicates, and liftings of functions
Interprocedural Type Specialization of JavaScript Programs Without Type Analysis
Dynamically typed programming languages such as Python and JavaScript defer
type checking to run time. VM implementations can improve performance by
eliminating redundant dynamic type checks. However, type inference analyses are
often costly and involve tradeoffs between compilation time and resulting
precision. This has lead to the creation of increasingly complex multi-tiered
VM architectures.
Lazy basic block versioning is a simple JIT compilation technique which
effectively removes redundant type checks from critical code paths. This novel
approach lazily generates type-specialized versions of basic blocks on-the-fly
while propagating context-dependent type information. This approach does not
require the use of costly program analyses, is not restricted by the precision
limitations of traditional type analyses.
This paper extends lazy basic block versioning to propagate type information
interprocedurally, across function call boundaries. Our implementation in a
JavaScript JIT compiler shows that across 26 benchmarks, interprocedural basic
block versioning eliminates more type tag tests on average than what is
achievable with static type analysis without resorting to code transformations.
On average, 94.3% of type tag tests are eliminated, yielding speedups of up to
56%. We also show that our implementation is able to outperform Truffle/JS on
several benchmarks, both in terms of execution time and compilation time.Comment: 10 pages, 10 figures, submitted to CGO 201
Permission-Based Separation Logic for Multithreaded Java Programs
This paper presents a program logic for reasoning about multithreaded
Java-like programs with dynamic thread creation, thread joining and reentrant
object monitors. The logic is based on concurrent separation logic. It is the
first detailed adaptation of concurrent separation logic to a multithreaded
Java-like language. The program logic associates a unique static access
permission with each heap location, ensuring exclusive write accesses and
ruling out data races. Concurrent reads are supported through fractional
permissions. Permissions can be transferred between threads upon thread
starting, thread joining, initial monitor entrancies and final monitor exits.
In order to distinguish between initial monitor entrancies and monitor
reentrancies, auxiliary variables keep track of multisets of currently held
monitors. Data abstraction and behavioral subtyping are facilitated through
abstract predicates, which are also used to represent monitor invariants,
preconditions for thread starting and postconditions for thread joining.
Value-parametrized types allow to conveniently capture common strong global
invariants, like static object ownership relations. The program logic is
presented for a model language with Java-like classes and interfaces, the
soundness of the program logic is proven, and a number of illustrative examples
are presented
- …