Lessons learned from the early performance evaluation of Intel Optane DC
  Persistent Memory in DBMS by Wu, Yinjun et al.
Lessons learned from the early performance evaluation of Intel
Optane DC Persistent Memory in DBMS
Yinjun Wu 1, Kwanghyun Park 2, Rathijit Sen 2, Brian Kroth 2, Jaeyoung Do 3
1University of Pennsylvania, 2Microso Gray Systems Lab, 3Microso Research
wuyinjun@seas.upenn.edu
{<name>.<surname>}@microsoft.com
jaedo@microsoft.com
ABSTRACT
Non-volatile memory (NVM) is an emerging technology, which
has the persistence characteristics of large capacity storage devices
(e.g., HDDs and SSDs), while providing the low access latency
and byte-addressablity of traditional DRAM memory. is unique
combination of features open up several new design considerations
when building database management systems (DBMSs), such as
replacing DRAM (as the main working space memory) or block
devices (as the persistent storage), or complementing both at the
same time for several DBMS components (such as access methods,
storage engine, buer management, logging/recovery, etc).
However, interacting with NVM requires changes to application
soware to best use the device (e.g. mmap and clflush of small
cachelines instead of write and fsync of large page buers). Before
introducing (potentially major) code changes to the DBMS for NVM,
developers need a clear understanding of NVM performance in
various conditions to help make beer design choices.
In this paper, we provide extensive performance evaluations
conducted with a recently released NVM device, Intel Optane DC
Persistent Memory (PMem), under dierent congurations with
several micro-benchmark tools. Further, we evaluate OLTP and
OLAP database workloads (i.e., TPC-C and TPC-H) with Microso
SQL Server 2019 when using the NVM device as an in-memory
buer pool or persistent storage. From the lessons learned we share
some recommendations for future DBMS design with PMem, e.g.
simple hardware or soware changes are not enough for the best
use of PMem in DBMSs.
1 INTRODUCTION
Non-volatile memory (NVM) is an emerging technology which
forces the database community to revisit various DBMS internals
(e.g., access methods, storage engine, buer management, logging,
recovery, etc.) [9] because of its unique device characteristics. For
example, one recently released NVM device, Intel Optane DC per-
sistent memory (PMem for short hereaer), provides large capacity
and persistence like traditional block storage devices (e.g., HDDs
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permied. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
Conference’17, Washington, DC, USA
© 2016 ACM. 978-x-xxxx-xxxx-x/YY/MM. . .$15.00
DOI: 10.1145/nnnnnnn.nnnnnnn
and SSDs), as well as low latency and byte-addressability like DRAM
memory.
Over the last few years, there have been some eorts on measur-
ing the performance of this new device [6–8, 18]. However, some
important PMem device measurements, which is critical to the per-
formance of processing database queries, are missing (e.g., degree
of parallelism, I/O request size, etc.). Further, some important data-
base design questions that need to be considered when developing
DBMSs for NVM have not been thoroughly answered. For example,
due to its unique characteristics, PMem can be possibly used in
DBMSs for overcoming the limited DRAM buer pool size or for
improving the performance of persistent storage or both. Such
system design choices still remain largely open.
e focus of this paper is to explore PMem’s characteristics,
mainly in the DBMS perspective, under several PMem congura-
tions with micro-benchmark tools. e observations and analysis
we gain in this paper help explain more traditional database work-
load performance results (e.g., impact of writes on OLTP and OLAP
workloads) and inform our suggestions for optimal DBMS internal
parameters for query processing on PMem. For example, our results
show that the small I/O request sizes can best take advantage of
PMem’s relatively high performance compared to other traditional
storage devices (i.e., SSDs). Additionally, we observe that PMem
devices do not exhibit the same concurrent request scaling rules as
SSDs.
In addition, we evaluate OLTP and OLAP workloads (i.e., TPC-C
and TPC-H) with dierent PMem device congurations to explore
the potential database design choices when integrating PMem in
DBMSs. Our OLTP and OLAP evaluations reveal some important
design considerations when building DBMSswith PMem: (1) Replac-
ing a traditional DRAM based buer pool with PMem congured
in Memory Mode (e.g., for its increased capacity for working set
memory and ability to use directly without soware changes) is
a suboptimal solution because of its extreme performance asym-
metry between reads and writes. is design choice hurts not only
the performance of write intensive OLTP workloads, but also that
of read intensive OLAP workloads because all intermediate query
results are also wrien to PMem; (2) extending a DRAM buer
pool with PMem requires a PMem enlightened database page to
buer placement policy. For example, our results show that the
performance of OLAP workloads can drop signicantly if we do not
ensure hot pages land on DRAM, despite the high read performance
of PMem.
Our device micro-benchmark results and database workload
evaluations point out that the solutions introduced so far only
provided initial implementations, but there are huge opportunities
ar
X
iv
:2
00
5.
07
65
8v
1 
 [c
s.D
B]
  1
5 M
ay
 20
20
for additional research to properly adopt this new technology for
DBMS internals.
In summary, the contributions of our paper are as follows:
• We share some important PMem device characteristics and
introduce crucial factors (e.g., degree of parallelism, I/O
request size, and impact of frequent writes) for database
query processing with PMem.
• We evaluate OLTP and OLAP workloads with dierent
PMem congurations (i.e., replacing DRAM buer pool,
extending DRAMbuer pool, and replacing storage devices
with PMem) in a production grade DBMS, and
• Based on our ndings, we make design choice recommen-
dations for buer management and storage engines for
DBMSs.
In the remainder of the paper, we present related work and
dierent possible PMem congurations in Section 2 and Section 3
followed by the PMem device performance and database workload
evaluations with PMem in Section 4 and 5 respectively. Some future
research directions are discussed in Section 6.
2 RELATEDWORK
Aer the recent release of PMem, initial eorts on performance
characterization of the device have been started [6, 8, 18, 23, 29, 30],
which also motivates the performance evaluations of PMem in
dierent applications, e.g. B-tree performance evaluations [20],
scientic benchmark evaluations [22], power usage evaluations
[23], hybrid memory system evaluations [17], integer compression
schemes [31] and some initial database workload evaluations on
PMem [18]. ere are also some other initial studies on how to close
the latency gap between DRAM and PMem in latency-sensitive
operations [24], how to provide ecient I/O primitives with PMem
[26], how to design beer page replacement policy with PMem
[21] and how to eciently provide replication mechanisms with
PMem [32]. However, to our knowledge, no detailed database
workload evaluations and DBMS design recommendations on the
newly released PMem device exist yet.
Prior to the release of PMem, a series of research works have
been motivated by the predicated use of future NVM devices. ese
eorts focused on how to utilize NVM in database systems for
improved performance, recovery, or other lines compared to other
storage devices (see [10, 12, 13, 25, 28] and the references herein).
However, due to the lack of real NVM devices when those works
were published, they were based on NVM emulated in DRAM,
typically with an expectation of near DRAM bandwidths, which
we now know to be inaccurate assumptions. Hence, the feasibility
of those ideas on the real NVM device is still unknown.
3 PMEM CONFIGURATIONS
We use three dierent modes (shown in Figure 1) of accessing
PMem, each of which has its own set of trade-os and must be
chosen in the system’s rmware conguration.
First, in Memory Mode PMem is used as the main memory of a
system that leverages DRAM as a high-speed L4 cache, managed
by the hardware’s memory controller. is mode can be used as a
way to increase the capacity of the working memory for a system,
and to use PMem with existing applications without modication.
Figure 1: Dierent congurations of PMem. PMemwith the
orange color (in the PMem and File I/O modes) indicates the
use of PMem as persistent storage while the gray color (in
the Memory Mode) means its usage as volatile memory.
However, in this mode, PMem is actually still volatile. at is to say,
all data wrien to PMem in Memory Mode is lost upon a system
restart (whether the cause of the restart is intentional or not). us,
in this mode we are not able to take advantage of a key feature of
the device. Moreover, the cache (re)placement policy is subject to
NUMA and other eects and cannot be managed by soware.
In both of the other access mode options, Direct Access (DAX)
and File I/O, PMem guarantees persistence of the data it stores.
In DAX mode, applications use PMem via memory semantics
(i.e., load and store instructions), aer an initial interaction with
the OS kernel through the mmap syscall to setup a virtual address
space mapping, thus allowing the direct CPU data access to PMem’s
address space without further kernel intervention. However, this
may involve signicant changes to application code to handle things
like torn writes due to CPU cacheline ushing semantics that most
applications don’t generally need to be concerned with. Instead,
application developers have to carefully inject clflush, mfence,
and other low level instructions or make use of libraries (e.g. Intel’s
PMDK [4]), which are known to introduce other ineciencies [19],
to do it for them.
On the other hand, in File I/O mode, PMem is accessed by ap-
plications using standard le system APIs (e.g., read, write, etc.).
is has higher latency than DAX mode when the number of data
accesses within a region are frequent since each operation must
pass through the entire I/O soware stack of the OS. Additionally,
block I/O typically does not provide byte-addressable semantics.
Table 1 provides a summary of the PMem congurations.
feature
conguration Memory
Mode
DAX
Mode
File I/O
Mode
disk like
features
Persistence X X
Large capacity X X X
memory like
features
Byte
addressability X X
CPU
direct access X X
Table 1: A summary of dierent PMem congurations.
2
4 PMEMMICRO-BENCHMARK ANALYSIS
is section presents a comprehensive performance analysis of
PMem device characteristics when used as persistent storage (i.e.,
the DAX or File I/Omodes) or as main memory (i.e., MemoryMode),
as described in Section 3.
4.1 System conguration
We conduct all experiments on a dual-socket server with Intel Xeon
Platinum 8260L CPUs. Each socket has 24 physical cores, each
of which has its own 32KB L1 and 1MB L2 caches, and a shared
35.75MB L3 cache. Each socket is also populated with 96GB of
DRAM (six 16GB Micron DDR4), and 1 TB PMem (two interleaved
512GB Intel Optane DC Persistent Memory modules installed in
memory DIMM slots).
Note that all experiments in this section are performed on the two
interleaved PMemmodules in one socket accessed by the local CPUs
(i.e., no remote NUMA accesses). To compare performance when
PMem is used as persistent storage, we use one NVMe 4TB Intel
DC P4510 Series SSD (called SSD for short hereaer). Finally, we
run Ubuntu 18.10 on the server where hyper-threading is enabled,
and the CPU power governor is congured to performance mode,
forcing the CPU to use the highest possible (turbo) clock frequency.
4.2 PMem as persistent storage
We rst investigate the performance characteristics of PMem when
it is used in the DAX or File I/O modes, comparing with SSD, and
study their implications when designing DBMSs for PMem. For this
evaluation, we use Flexible I/O tester (fio) [1] to issue synchronized
I/O requests, varying several parameters, including the I/O request
size1, access paerns (random/sequential and read/write), and the
number of I/O threads, seeking to answer the following questions:
(Q1) How will the dierent I/O sizes aect the performance of
PMem in the DAX and File I/O modes?
(Q2) How is PMem’s performance aected as the number of I/O
threads increases?
(Q3) What is the performance dierence between read and write
requests in PMem?
While previous work [18] has already addressed portions of the
above questions (e.g., [18] focuses on single-thread experiments
while we vary the number of I/O threads and I/O sizes simultane-
ously), our main contribution in this analysis is to consider how
PMem’s performance characteristics aect when redesigning var-
ious DBMS components for PMem. Note that due to space lim-
its, we present bandwidth results measured with only sequential
read/write workloads (Figure 2–3). We also measured the latency
results, which, however are less relevant to the database workload
experiments and thus not included here.
Observation 1. Figure 2 shows the peak bandwidth of read and
write workloads with varied I/O request sizes. When the I/O size is
small (i.e., < 4KB), PMem DAX shows signicant performance gains
compared to PMem File I/O and SSD (e.g., with 512B I/O size, PMem
DAX achieves about 8x and 50x higher throughputs than PMem File
I/O and SSD, respectively). However, with bigger I/O sizes, there is no
1In the context of DAX mode we mean load/store instruction operations when we
refer to ”I/O requests”, whereas in block mode we mean block read/write operations.
(a) Read
(b) Write
Figure 2: Sequential-access bandwidth results. Note that the
minimum I/O size for SSD is limited to its device sector size
(i.e., 512B) while there is no such limitation for PMem in
DAX and File I/O modes.
distinct performance advantage of PMem DAX compared to PMem
File I/O.
AnalysisUnlike PMemDAX of which the internal device access
size is 256B with 64B cacheline operations from the byte level CPU
load/store instructions perspective, PMem File I/O and SSD access
data in the sector size granularity exposed to the OS, which is
typically congured to 4KB or 512B. erefore, PMem File I/O and
SSD waste I/O bandwidth if the required data size is smaller than
the block size. Further PMem File I/O and SSD always need to
go through the I/O stack to fetch data, adding extra system call
overheads. On the other hand, this overhead can be easily amortized
when issuing large I/Os, resulting in no noticeable advantage in
PMem DAX over others.
Recommendations e performance improvement achieved
by using small I/O sizes in PMem DAX provides an interesting
opportunity to speed up the query processing in DBMSs with small
page sizes. As an example, let’s consider using a 512B page in-
stead of 8KB for a B-tree node, and further assume that each record
has a size of 64B. en we can t 8 and 128 records into 512B
and 8KB nodes, respectively, and the depth of the B-tree index
can be calculated as loд8(n) for the 512B node, and loд128(n) for
the 8KB node, where n is the total number of records in the data-
base. us, the depth dierence between two node sizes would
be loд8(n)/loд128(n) ≈ 2. Given the latency of accessing 512B data
is about ve times faster than 8KB (i.e., on a single-thread mea-
surement, we observed that the access latency in PMem DAX is
reduced from 2.6 µs to 0.5 µs), we can expect in total about 2.5x
faster index lookup time of traversing from the root to a leaf node
3
(a) Read
(b) Write
Figure 3: Sequential-access bandwidth results as varying the
number of I/O threads, each of which issues 8KB reads and
writes, respectively.
(i.e., 2x deeper depth, but ˜5x faster page access latency) with the
512B node.
Observation 2. Figure 3 shows how read/write bandwidth
changes with the number of I/O threads using a xed-size I/O size
(i.e., 8KB). As can be seen in the gure, the read/write bandwidth of
PMem gradually drops aer a certain number of threads (i.e., 12 and
4 threads in reads and writes, respectively). us, although PMem pro-
vides beer overall throughput than SSD, it exhibits poor concurrent
request scaling compared with it.
Analysis: With more threads, the per-thread sequential access
paern becomes in aggregate more random from a device perspec-
tive, thus causing more congestion at the internal PMem device
controller. (A similar observation can be found in [18]). e results
shown in Figure 3 indicate that, unlike SSD, a large number of
outstanding I/Os are not needed to fully exploit the parallelism of
PMem for both modes.
Recommendations e relatively poor scalability of PMem
compared to SSD recommends us to limit the maximum degree of
parallelism for PMem I/O within a database engine.
Observation 3. e performance of reads and writes of the DAX
and File I/O are asymmetric (peak read bandwidth vs. peak write
bandwidth for both PMemDAX and File I/O: ˜10.6GB/s vs. ˜3.5GB/s). In
contrast, SSD provides more balanced performance between reads and
writes (˜2.7GB/s vs. ˜2.8GB/s). As mentioned earlier, this performance
paern is also observed by [18].
Recommendations e read-write asymmetry of PMem im-
plies the necessity of avoiding writes as much as possible for PMem.
4.3 PMem as Memory
In this subsection, we present the performance characteristics of
PMem in memory mode by using Memory Latency checker (MLC
(a) Read
(b) Write
Figure 4: Sequential read/write bandwidth results when
PMem is used as memory.
hereaer) [3], which is compared with the performance of DRAM
and the PMem in DAX mode.
Also as observed in [29], performance measurements in Memory
Mode require careful aention to data sizes. Too small, and they
will t entirely in the DRAM L4 cache, and thus won’t measure
PMem behavior at all. To address this we initialized two dierent
size of data (100GB and 500GB) in PMem each of which is larger
than the 96GB DRAM we have allocated to a single socket.2 Note
that during the write performance measurements on 500GB dataset
in PMem Memory Mode, MLC crashed due to some internal issues.
So, we present the write performance only for 100GB dataset. In
this subsection, we wish to address the following questions:
(Q4) How does the number of threads inuence PMem Memory
Mode performance?
(Q5) How does the dataset size inuence PMem Memory Mode
performance?
(Q6) What is the relative performance of PMem Memory Mode
compared to that of PMem DAX and DRAM?
Note that [7] and [29] have provided initial evaluations on PMem
Memory Mode with dierent dataset sizes under dierent system
congurations, addressing (Q5) and also partially (Q6). But the in-
uence of the thread count against the performance is still missing.
Due to the space limit, only the sequential read and write band-
width are shown in Figure 4, from which we note the following:
Observation 4. As the number of threads increases, the PMem
Memory Mode bandwidth increases rst before it becomes saturated
and drops.
Analysis: is performance paern also appears in PMem DAX
and PMem File I/O, which can thus be explained in the same way
2Recall that measurements are restricted to a single socket.
4
as Observation 2 in Section 4.2. Namely, concurrent requests ap-
pear as random I/O resulting in increased contention to the device
controller.
Recommendations Similar to the discussion in Observation 2,
to use PMem as the buer pool in a DBMS via Memory Mode, it
is advisable to limit the number of concurrent threads accessing
PMem. However, in this context, especially when the working set
exceeds the size of DRAM, it means limiting concurrency inside the
entire buer pool as well, since we lack other soware control over
which pages are in the L4 DRAM cache vs PMem, thus eectively
limiting the amount of concurrency on the entire DBMS.
Observation 5. PMem Memory Mode read bandwidth drops
as the size of data increases (e.g., the peak bandwidth drops from
˜13.9GB/s (100GB) to ˜8.7GB/s (500GB) with four threads)
Analysis: Similar performance paern is also observed in [29]:
as the dataset size increases in PMem Memory Mode, a larger
proportion of requests miss in the L4 DRAM cache, and have to be
serviced by PMem. As noted previously in Observation 2, this will
result in a larger number of concurrent accesses to DRAM which
may appear more random and result in more contention.
Observation 6. Under write workloads with smaller data size
(100GB), the PMemMemory Mode performs slightly beer than PMem
DAX while with larger data size or under highly parallel write work-
loads with smaller data size, PMem Memory Mode performs worse
than PMem DAX. We can also observe a huge gap between the DRAM
performance and PMem performance
Analysis: As explained above, in PMem Memory Mode, if the
data size in PMem is larger, DRAM page misses may happen, thus
slowing down the data access speed, which becomesmore severe for
writes in PMem Memory Mode (see the worse write performance
of PMem Memory Mode than that of PMem DAX with 16 threads
in Figure 4b). In contrast, DRAM misses do not happen for PMem
DAX due to the CPU direct access. Also as expected, the DRAM
performance is much beer than PMem.
Recommendations rough the observation 5 and observa-
tion 6, we know that with data size larger than DRAM, especially
under write-intensive workloads, PMem DAX performs beer than
PMem Memory Mode. As we will see in Section 5, under OLAP
workloads where many writes for intermediate results are issued,
PMem Memory Mode still perform worse than PMem DAX, and
even worse than SSD.
5 DATABASE WORKLOAD EVALUATIONS
We will now present our performance evaluation of typical OLTP
and OLAP workloads, specically, TPC-C and TPC-H workloads,
with dierent PMem congurations in Microso SQL Server 2019.
5.1 Experimental design
As mentioned earlier, we can directly use PMem in SQL Server
2019 either as the persistent storage or in conjunction with the
buer pool. For the former use case, File I/O is currently still issued
against PMem. For the laer use case, the DRAM buer pool in
SQL Server 2019 can be extended with Hybrid buer pool support
[2], whereby the warm pages cached in PMem are accessed with
DAX without being cached in DRAM buer pool.
In the following, we continue to use PMem DAX and PMem File
I/O to denote the use of PMem as the persistent storage, with the
Hybrid buer pool enabled and disabled respectively. We compare
these with the traditional DBMS conguration: SSD as the persis-
tent storage and DRAM as the buer pool (denoted as SSD). When
PMem is used in Memory Mode as the buer pool in SQL Server,
DRAM behaves like an L4 cache and SSD is the persistent storage,
which we denote by SSD + PMem Memory Mode. For this case,
we denote the traditional DBMS conguration as SSD + DRAM to
highlight the dierence in the two buer pool congurations.
Similar to the earlier congurations in Section 4, we use PMem
and NVMe SSD in one socket and set the CPU anity mask in SQL
Server such that only the CPUs from the same socket are used [5].
We observed that the location of the tempdb where the interme-
diate results are stored can inuence the performance. erefore,
throughout the experiments, we use the same directory for tempdb
which is located in a separate SSD drive not used for persistent data
storage.
We study two typical database workloads — TPC-C and TPC-
H. For the TPC-C experiments, we use OLTPBench [14, 15] to
issue the queries and congure each run to last 30 minutes. To
minimize the eect of checkpoints, we also set the checkpointing
recovery interval in SQL Server to 60 minutes, which eectively
disables checkpoints during the TPC-C experiments. For the TPC-H
experiments, we warm up the buer pool by running each of the
22 queries for multiple times and only consider the runtime for the
last execution.
To understand the inuence of the dataset size, we use two
dierent scale factors — SF 100 and SF 500 for TPC-H and SF 1300
and SF 6500 for TPC-C. ese generate database instances with
sizes of approximately 100GB and 500GB respectively.
5.2 TPC-H benchmark
Due to space constraints, we present a subset of the performance
results for TPC-H queries in Figures 5–6. We note the following:
Observation 7. With smaller scale factor, i.e. SF=100, PMem
DAX fails to outperform SSD and PMem File I/O. As Figure 5 shows,
when SF=100, the runtime of ery 3 with PMem DAX is ˜25s, which
is ˜6X slower than PMem File I/O and SSD (< 5s for both). is is
alleviated with larger SF, i.e. SF = 500, with which DAX is slightly
beer than SSD (˜188s vs. ˜197s)
Analysis: As explained in [2], the use of the Hybrid buer pool
with DAX is to reference the pages in PMem instead of caching
them in DRAM buer pool. is implies that with the Hybrid buer
pool enabled, the hot pages in PMem are fetched by CPUs from
PMem repetitively, failing to utilize the benet of the DRAM buer
pool. In contrast, when SF=100, by using PMem File I/O and SSD,
large portions of the hot pages are cached in DRAM buer pool
aer their rst access, leading to smaller overheads in subsequent
page accesses and, thus, shorter query execution times. We conrm
this phenomenon by monitoring memory usage that reveals that
up to 150GB DRAM buer pool is used for PMem File I/O and SSD,
while only ˜20GB DRAM is used for PMem DAX.
In case of larger SFs, only a small portion of the hot pages can be
cached in the DRAM buer pool for PMem File I/O and SSD, which
means that the I/O overhead will become more prominent. We
5
Figure 5: TPC-H results with PMem as persistent storage
Figure 6: TPC-H results with PMem as buer pool
observe that ery 3 and ery 18 run slightly faster on PMem
due to the smaller I/O overhead compared to SSD. Also note that
the performance dierence between PMem DAX and PMem File I/O
is quite small. e reason is that the page size in SQL Server is
8K, for which there is no performance dierence under the read
workloads between DAX and File I/O as observed in Section 4.2.
Recommendationsis unexpected result indicates the need
for further research into good DAX PMem aware page placement
policies for the buer pool in DBMS. One possible way to improve
this is to use similar ideas to the buer pool extension support in
SQL Server [16] which prioritizes hot pages in DRAM rst, and
uses SSD as a second chance tier.
Observation 8. As Figure 6 shows, the performance of SSD +
PMem Memory Mode is worse than that of SSD + DRAM. For example,
with SF=100, the runtime ofery 3 with SSD + PMem Memory Mode
is ˜60s, far longer than the runtime with SSD + DRAM.
Analysis: Although TPC-H is a read-intensive workload, sub-
stantial intermediate results are generated during query execution,
especially when the data size is large. ese are wrien to the buer
pool and possibly spill to tempdb les if there is not enough mem-
ory, resulting in many write operations. As explained in Section 4.3,
write operations on PMem Memory Mode can lead to performance
loss, thus longer query processing time.
Recommendationsis result indicates that it is not appropri-
ate to directly replace DRAM with PMem in Memory Mode as the
buer pool in DBMS even for (read intensive) TPC-H workloads
because of the writes to tempdb for intermediate query results.
Figure 7: TPC-C results with varied scale factors (SF)
5.3 TPC-C benchmark
We show TPC-C performance results in Figure 7 and observe that:
Observation 9. ere is no signicant performance dierence
between PMem File I/O, PMemDAX, and SSD under TPC-Cworkloads.
Analysis: TPC-C is a write-intensive workload. As Figure 2 in
Section 4 shows, with the SQL Server page size of 8K, the peak
write bandwidth of both PMem DAX and PMem File I/O is only
marginally beer than that of SSD. But when the device I/Os are
fully saturated, the write bandwidth of PMem drops compared to
the peak write bandwidth due to congestion, which, however, we
do not observe in SSD (see Figure 3 in Section 4.2).
Recommendationse downside of the write operations on
PMem has been known by the database community even before
the appearance of the real PMem device, which has motivated
several related works on limiting the write operations on PMem
by redesigning the B-tree index [27], join algorithms [28], query
optimizer [11], etc, for PMem-aware DBMSs. It would be valuable
to revisit the feasibility of those early works on the real PMem
device.
6 CONCLUSION AND FUTUREWORK
In this paper, we explored some missing device characteristics
that are closely related to the database query performance. Our
results revealed that some DBMS internal congurations should be
changed to take advantage of PMem in the system; (1) A dierent
degree of parallelism, (2) optimal I/O request sizes, and (3) a new
page placement policy (to avoid frequent writes on PMem) should
be considered to optimize the use of PMem in DBMSs.
Our database workload evaluations showed that developers
should clearly understand the new device characteristics before
introducing soware/hardware changes to DBMSs for the best use
of PMem in the system. In other words, simple hardware (e.g, re-
placing DRAM buer pool with PMem) or soware (e.g, extending
DRAM buer pool with PMem without introducing a new page
placement policy) changes will not be properly integrating PMem
in DBMSs.
We revealed some important aspects of using PMem in DBMSs,
which can help make beer database design choices. However,
there are still many open questions on the best use of PMem in
the system for many other DBMS internals (e.g., access methods,
logging/recovery, etc.).
6
REFERENCES
[1] Flexible I/O tester (FIO) rev. 3.15. hps://hps://o.readthedocs.io/en/latest/
o doc.html.
[2] Hybrid buer pool feature in SQL server. hps://docs.microso.com/
en-us/sql/database-engine/congure-windows/hybrid-buer-pool?view=
sql-server-ver15.
[3] Intel Memory Latency Checker v3.8. hps://soware.intel.com/en-us/articles/
intelr-memory-latency-checker.
[4] Persistent memory development kit.
[5] So-NUMA (SQL Server). hps://docs.microso.com/en-us/sql/database-engine/
congure-windows/so-numa-sql-server?view=sql-server-ver15.
[6] Analyzing the Performance of Intel Optane DC Persistent Memory in App Direct
Mode in Lenovo inkSystem Servers. hps://lenovopress.com/lp1083.pdf, 2019.
[7] Analyzing the Performance of Intel Optane DC Persistent Memory in Memory
Mode in Lenovo inkSystem Servers. hps://lenovopress.com/lp1084.pdf, 2019.
[8] HPE Persistent Memory performance in HPE ProLiant, HPE Synergy, and
HPE Apollo Gen10 servers with second-generation Intel Xeon Scalable
processors. hps://h20195.www2.hpe.com/v2/getdocument.aspx?docname=
a00075993enw, 2019.
[9] J. Arulraj and A. Pavlo. Non-volatile memory database management systems.
Synthesis Lectures on Data Management, 11(1):1–191, 2019.
[10] J. Arulraj, M. Perron, and A. Pavlo. Write-behind logging. Proceedings of the
VLDB Endowment, 10(4):337–348, 2016.
[11] D. Bausch, I. Petrov, and A. Buchmann. Making cost-based query optimization
asymmetry-aware. In Proceedings of the Eighth International Workshop on Data
Management on New Hardware, pages 24–32, 2012.
[12] G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, and J. Shun. Sorting with
asymmetric read and write costs. In Proceedings of the 27th ACM symposium on
Parallelism in Algorithms and Architectures, pages 1–12. ACM, 2015.
[13] S. Chen and Q. Jin. Persistent b+-trees in non-volatile main memory. Proceedings
of the VLDB Endowment, 8(7):786–797, 2015.
[14] C. A. Curino, D. E. Difallah, A. Pavlo, and P. Cudre-Mauroux. Benchmarking
oltp/web databases in the cloud: e oltp-bench framework. In Proceedings of
the fourth international workshop on Cloud data management, pages 17–20, 2012.
[15] D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An
extensible testbed for benchmarking relational databases. Proceedings of the
VLDB Endowment, 7(4):277–288, 2013.
[16] J. Do, D. Zhang, J. M. Patel, D. J. DeWi, J. F. Naughton, and A. Halverson.
Turbocharging dbms buer pool using ssds. In Proceedings of the 2011 ACM
SIGMOD International Conference on Management of data, pages 1113–1124, 2011.
[17] S. Imamura and E. Yoshida. e analysis of inter-process interference on a
hybrid memory system. In Proceedings of the International Conference on High
Performance Computing in Asia-Pacic Region Workshops, pages 1–4, 2020.
[18] J. Izraelevitz, J. Yang, L. Zhang, J. Kim, X. Liu, A. Memaripour, Y. J. Soh, Z. Wang,
Y. Xu, S. R. Dulloor, et al. Basic performance measurements of the intel optane
dc persistent memory module. arXiv preprint arXiv:1903.05714, 2019.
[19] R. Kadekodi, S. K. Lee, S. Kashyap, T. Kim, A. Kolli, and V. Chidambaram. Splitfs:
reducing soware overhead in le systems for persistent memory. In Proceedings
of the 27th ACM Symposium on Operating Systems Principles, pages 494–508, 2019.
[20] L. Lersch, X. Hao, I. Oukid, T. Wang, and T. Willhalm. Evaluating persistent
memory range indexes. Proceedings of the VLDB Endowment, 13(4):574–587, 2019.
[21] L. Lersch, W. Lehner, and I. Oukid. Persistent buer management with opti-
mistic consistency. In Proceedings of the 15th International Workshop on Data
Management on New Hardware, pages 1–3, 2019.
[22] V. Mironov, I. Chernykh, I. Kulikov, A. Moskovsky, E. Epifanovsky, and
A. Kudryavtsev. Performance evaluation of the intel optane dc memory with
scientic benchmarks. In 2019 IEEE/ACM Workshop on Memory Centric High
Performance Computing (MCHPC), pages 1–6. IEEE, 2019.
[23] I. B. Peng, M. B. Gokhale, and E. W. Green. System evaluation of the intel optane
byte-addressable nvm. In Proceedings of the International Symposium on Memory
Systems, pages 304–315, 2019.
[24] G. Psaropoulos, I. Oukid, T. Legler, N. May, and A. Ailamaki. Bridging the latency
gap between nvm and dram for latency-bound operations. In Proceedings of the
15th International Workshop on Data Management on New Hardware, pages 1–8,
2019.
[25] A. van Renen, V. Leis, A. Kemper, T. Neumann, T. Hashida, K. Oe, Y. Doi,
L. Harada, and M. Sato. Managing non-volatile memory in database systems. In
Proceedings of the 2018 International Conference on Management of Data, pages
1541–1555. ACM, 2018.
[26] A. van Renen, L. Vogel, V. Leis, T. Neumann, and A. Kemper. Persistent mem-
ory i/o primitives. In Proceedings of the 15th International Workshop on Data
Management on New Hardware, pages 1–7, 2019.
[27] S. D. Viglas. Adapting the b+-tree for asymmetric i/o. In East European Conference
on Advances in Databases and Information Systems, pages 399–412. Springer, 2012.
[28] S. D. Viglas. Write-limited sorts and joins for persistent memory. Proceedings of
the VLDB Endowment, 7(5):413–424, 2014.
[29] M. Weiland, H. Brunst, T. intino, N. Johnson, O. Irig, S. Smart, C. Herold,
A. Bonanni, A. Jackson, and M. Parsons. An early evaluation of intel’s optane
dc persistent memory module and its impact on high-performance scientic
applications. In Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis, pages 1–19, 2019.
[30] J. Yang, J. Kim, M. Hoseinzadeh, J. Izraelevitz, and S. Swanson. An empirical
guide to the behavior and use of scalable persistent memory. arXiv preprint
arXiv:1908.03583, 2019.
[31] M. Zarubin, P. Damme, T. Kissinger, D. Habich, W. Lehner, and T. Willhalm.
Integer compression in nvram-centric data stores: Comparative experimental
analysis to dram. In Proceedings of the 15th International Workshop on Data
Management on New Hardware, pages 1–11, 2019.
[32] M. Zarubin, T. Kissinger, D. Habich, T. Willhalm, and W. Lehner. Ecient
compute node-local replication mechanisms for nvram-centric data structures.
e VLDB Journal, pages 1–21, 2019.
7
