4,226 research outputs found
Rumble: Data Independence for Large Messy Data Sets
This paper introduces Rumble, an engine that executes JSONiq queries on
large, heterogeneous and nested collections of JSON objects, leveraging the
parallel capabilities of Spark so as to provide a high degree of data
independence. The design is based on two key insights: (i) how to map JSONiq
expressions to Spark transformations on RDDs and (ii) how to map JSONiq FLWOR
clauses to Spark SQL on DataFrames. We have developed a working implementation
of these mappings showing that JSONiq can efficiently run on Spark to query
billions of objects into, at least, the TB range. The JSONiq code is concise in
comparison to Spark's host languages while seamlessly supporting the nested,
heterogeneous data sets that Spark SQL does not. The ability to process this
kind of input, commonly found, is paramount for data cleaning and curation. The
experimental analysis indicates that there is no excessive performance loss,
occasionally even a gain, over Spark SQL for structured data, and a performance
gain over PySpark. This demonstrates that a language such as JSONiq is a simple
and viable approach to large-scale querying of denormalized, heterogeneous,
arborescent data sets, in the same way as SQL can be leveraged for structured
data sets. The results also illustrate that Codd's concept of data independence
makes as much sense for heterogeneous, nested data sets as it does on highly
structured tables.Comment: Preprint, 9 page
Static Application-Level Race Detection in STM Haskell using Contracts
Writing concurrent programs is a hard task, even when using high-level
synchronization primitives such as transactional memories together with a
functional language with well-controlled side-effects such as Haskell, because
the interferences generated by the processes to each other can occur at
different levels and in a very subtle way. The problem occurs when a thread
leaves or exposes the shared data in an inconsistent state with respect to the
application logic or the real meaning of the data. In this paper, we propose to
associate contracts to transactions and we define a program transformation that
makes it possible to extend static contract checking in the context of STM
Haskell. As a result, we are able to check statically that each transaction of
a STM Haskell program handles the shared data in a such way that a given
consistency property, expressed in the form of a user-defined boolean function,
is preserved. This ensures that bad interference will not occur during the
execution of the concurrent program.Comment: In Proceedings PLACES 2013, arXiv:1312.2218. [email protected];
[email protected]
- …