1 research outputs found
To pipeline or not to pipeline, that is the question
In designing query processing primitives, a crucial design choice is the
method for data transfer between two operators in a query plan. As we were
considering this critical design mechanism for an in-memory database system
that we are building, we quickly realized that (surprisingly) there isn't a
clear definition of this concept. Papers are full or ad hoc use of terms like
pipelining and blocking, but as these terms are not crisply defined, it is hard
to fully understand the results attributed to these concepts. To address this
limitation, we introduce a clear terminology for how to think about data
transfer between operators in a query pipeline. We show that there isn't a
clear definition of pipelining and blocking, and that there is a full spectrum
of techniques based on a simple concept called unit-of-transfer. Next, we
develop an analytical model for inter-operator communication, and highlight the
key parameters that impact performance (for in-memory database settings). Armed
with this model, we then apply it to the system we are designing and highlight
the insights we gathered from this exercise. We find that the gap between
pipelining and non-pipelining query execution, w.r.t. key factors such as
performance and memory footprint is quite narrow, and thus system designers
should likely rethink the notion of pipelining vs. blocking for in-memory
database systems