19,973 research outputs found
Code Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1
Rumble: Data Independence for Large Messy Data Sets
This paper introduces Rumble, an engine that executes JSONiq queries on
large, heterogeneous and nested collections of JSON objects, leveraging the
parallel capabilities of Spark so as to provide a high degree of data
independence. The design is based on two key insights: (i) how to map JSONiq
expressions to Spark transformations on RDDs and (ii) how to map JSONiq FLWOR
clauses to Spark SQL on DataFrames. We have developed a working implementation
of these mappings showing that JSONiq can efficiently run on Spark to query
billions of objects into, at least, the TB range. The JSONiq code is concise in
comparison to Spark's host languages while seamlessly supporting the nested,
heterogeneous data sets that Spark SQL does not. The ability to process this
kind of input, commonly found, is paramount for data cleaning and curation. The
experimental analysis indicates that there is no excessive performance loss,
occasionally even a gain, over Spark SQL for structured data, and a performance
gain over PySpark. This demonstrates that a language such as JSONiq is a simple
and viable approach to large-scale querying of denormalized, heterogeneous,
arborescent data sets, in the same way as SQL can be leveraged for structured
data sets. The results also illustrate that Codd's concept of data independence
makes as much sense for heterogeneous, nested data sets as it does on highly
structured tables.Comment: Preprint, 9 page
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
RDF Querying
Reactive Web systems, Web services, and Web-based publish/
subscribe systems communicate events as XML messages, and in
many cases require composite event detection: it is not sufficient to react
to single event messages, but events have to be considered in relation to
other events that are received over time.
Emphasizing language design and formal semantics, we describe the
rule-based query language XChangeEQ for detecting composite events.
XChangeEQ is designed to completely cover and integrate the four complementary
querying dimensions: event data, event composition, temporal
relationships, and event accumulation. Semantics are provided as
model and fixpoint theories; while this is an established approach for rule
languages, it has not been applied for event queries before
State-of-the-art on evolution and reactivity
This report starts by, in Chapter 1, outlining aspects of querying and updating resources on
the Web and on the Semantic Web, including the development of query and update languages
to be carried out within the Rewerse project.
From this outline, it becomes clear that several existing research areas and topics are of
interest for this work in Rewerse. In the remainder of this report we further present state of
the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give
an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs;
in Chapter 4 event-condition-action rules, both in the context of active database systems and
in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks
A pattern-based approach to a cell tracking ontology
Time-lapse microscopy has thoroughly transformed our understanding of biological motion and developmental dynamics from single cells to entire organisms. The increasing amount of cell tracking data demands the creation of tools to make extracted data searchable and interoperable between experiment and data types. In order to address that problem, the current paper reports on the progress in building the Cell Tracking Ontology (CTO): An ontology framework for describing, querying and integrating data from complementary experimental techniques in the domain of cell tracking experiments. CTO is based on a basic knowledge structure: the cellular genealogy serving as a backbone model to integrate specific biological ontologies into tracking data. As a first step we integrate the Phenotype and Trait Ontology (PATO) as one of the most relevant ontologies to annotate cell tracking experiments. The CTO requires both the integration of data on various levels of generality as well as the proper structuring of collected information. Therefore, in order to provide a sound foundation of the ontology, we have built on the rich body of work on top-level ontologies and established three generic ontology design patterns addressing three modeling challenges for properly representing cellular genealogies, i.e. representing entities existing in time, undergoing changes over time and their organization into more complex structures such as situations
Content-based Video Retrieval
no abstract
- …