115 research outputs found
The End of Slow Networks: It's Time for a Redesign
Next generation high-performance RDMA-capable networks will require a
fundamental rethinking of the design and architecture of modern distributed
DBMSs. These systems are commonly designed and optimized under the assumption
that the network is the bottleneck: the network is slow and "thin", and thus
needs to be avoided as much as possible. Yet this assumption no longer holds
true. With InfiniBand FDR 4x, the bandwidth available to transfer data across
network is in the same ballpark as the bandwidth of one memory channel, and it
increases even further with the most recent EDR standard. Moreover, with the
increasing advances of RDMA, the latency improves similarly fast. In this
paper, we first argue that the "old" distributed database design is not capable
of taking full advantage of the network. Second, we propose architectural
redesigns for OLTP, OLAP and advanced analytical frameworks to take better
advantage of the improved bandwidth, latency and RDMA capabilities. Finally,
for each of the workload categories, we show that remarkable performance
improvements can be achieved
The End of a Myth: Distributed Transactions Can Scale
The common wisdom is that distributed transactions do not scale. But what if
distributed transactions could be made scalable using the next generation of
networks and a redesign of distributed databases? There would be no need for
developers anymore to worry about co-partitioning schemes to achieve decent
performance. Application development would become easier as data placement
would no longer determine how scalable an application is. Hardware provisioning
would be simplified as the system administrator can expect a linear scale-out
when adding more machines rather than some complex sub-linear function, which
is highly application specific.
In this paper, we present the design of our novel scalable database system
NAM-DB and show that distributed transactions with the very common Snapshot
Isolation guarantee can indeed scale using the next generation of RDMA-enabled
network technology without any inherent bottlenecks. Our experiments with the
TPC-C benchmark show that our system scales linearly to over 6.5 million
new-order (14.5 million total) distributed transactions per second on 56
machines.Comment: 12 page
Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms
The enormous quantity of data produced every day together with advances in
data analytics has led to a proliferation of data management and analysis
systems. Typically, these systems are built around highly specialized
monolithic operators optimized for the underlying hardware. While effective in
the short term, such an approach makes the operators cumbersome to port and
adapt, which is increasingly required due to the speed at which algorithms and
hardware evolve. To address this limitation, we present Modularis, an execution
layer for data analytics based on sub-operators, i.e.,composable building
blocks resembling traditional database operators but at a finer granularity. To
demonstrate the advantages of our approach, we use Modularis to build a
distributed query processing system supporting relational queries running on an
RDMA cluster, a serverless cloud platform, and a smart storage engine.
Modularis requires minimal code changes to execute queries across these three
diverse hardware platforms, showing that the sub-operator approach reduces the
amount and complexity of the code. In fact, changes in the platform affect only
sub-operators that depend on the underlying hardware. We show the end-to-end
performance of Modularis by comparing it with a framework for SQL processing
(Presto), a commercial cluster database (SingleStore), as well as
Query-as-a-Service systems (Athena, BigQuery). Modularis outperforms all these
systems, proving that the design and architectural advantages of a modular
design can be achieved without degrading performance. We also compare Modularis
with a hand-optimized implementation of a join for RDMA clusters. We show that
Modularis has the advantage of being easily extensible to a wider range of join
variants and group by queries, all of which are not supported in the hand-tuned
join.Comment: Accepted at PVLDB vol. 1
Hyperscale Data Processing With Network-Centric Designs
Today’s largest data processing workloads are hosted in cloud data centers. Due to unprecedented data growth and the end of Moore’s Law, these workloads have ballooned to the hyperscale level, encompassing billions to trillions of data items and hundreds to thousands of machines per query. Enabling and expanding with these workloads are highly scalable data center networks that connect up to hundreds of thousands of networked servers. These massive scales fundamentally challenge the designs of both data processing systems and data center networks, and the classic layered designs are no longer sustainable.
Rather than optimize these massive layers in silos, we build systems across them with principled network-centric designs. In current networks, we redesign data processing systems with network-awareness to minimize the cost of moving data in the network. In future networks, we propose new interfaces and services that the cloud infrastructure offers to applications and codesign data processing systems to achieve optimal query processing performance. To transform the network to future designs, we facilitate network innovation at scale.
This dissertation presents a line of systems work that covers all three directions. It first discusses GraphRex, a network-aware system that combines classic database and systems techniques to push the performance of massive graph queries in current data centers. It then introduces data processing in disaggregated data centers, a promising new cloud proposal. It details TELEPORT, a compute pushdown feature that eliminates data processing performance bottlenecks in disaggregated data centers, and Redy, which provides high-performance caches using remote disaggregated memory. Finally, it presents MimicNet, a fine-grained simulation framework that evaluates network proposals at datacenter scale with machine learning approximation. These systems demonstrate that our ideas in network-centric designs achieve orders of magnitude higher efficiency compared to the state of the art at hyperscale
- …