Leveraging Secondary Storage to Simulate Deep 54-qubit Sycamore Circuits by Pednault, Edwin et al.
Leveraging Secondary Storage to Simulate Deep 54-qubit
Sycamore Circuits
Edwin Pednault∗1, John A. Gunnels1, Giacomo Nannicini1, Lior Horesh1, and Robert
Wisnieff1
1IBM T.J. Watson Research Center, Yorktown Heights, NY
Abstract
In a recent paper, we showed that secondary storage can extend the range of quantum circuits that
can be practically simulated with classical algorithms. Here we refine those techniques and apply
them to the simulation of Sycamore circuits with 53 and 54 qubits, with the entanglement pattern
ABCDCDAB that has proven difficult to classically simulate with other approaches. Our analysis
shows that on the Summit supercomputer at Oak Ridge National Laboratories, such circuits can be
simulated with high fidelity to arbitrary depth in a matter of days, outputting all the amplitudes.
1 Introduction
There has been tremendous progress in the construction of quantum computers with superconducting
qubits [4]. As the hardware progresses, it is increasingly difficult to classically simulate the circuits
that can be executed on existing chips, a crucial task to – among other things – verify that the degree
to which hardware is behaving as expected. The literature contains several papers that discuss this task
and that have been published or posted online in the past two years [14, 3, 7, 10, 5, 11, 16, 6, 17, 8, 18].
Here we extend the analysis on the use of secondary storage initially reported in our previous work [14].
As we argued in that paper, secondary storage can extend the computational reach of supercomputers
for the simulation of quantum circuits — an idea initially suggested by [9]. We estimate that on the
Summit supercomputer at Oak Ridge National Laboratories, secondary storage allows the simulation of
53- and 54-qubit Sycamore circuits [15] with high fidelity to arbitrary depth. The Sycamore circuits are
a direct descendant of the “universal random circuits” described in [2]. In particular, for 20 cycles of
the entanglement pattern ABCDCDAB, which is specifically designed to challenge classical simulation
algorithms, we estimate that the computations would take approximately two and a half days. While
we did not carry out these computations, we provide a detailed description of the proposed simulation
strategy as well as the time estimation methodology, which is based on published results and on internal
benchmarks. The main building blocks of our approach are the same as those that we discussed in
[14], namely: exploitation of separable gates via a hyperedge representation of the tensor network;
allowing contractions between non-adjacent tensors; tensor slicing; and sporadic read/write operations
to access/store slices of the quantum state in secondary storage.
The rest of this paper is organized as follows. In Sect. 2 we provide a review of the main building
blocks of our simulation strategy. Sect. 3 describes the class of circuits studied in this work. Sect. 4 gives
a detailed explanation of the simulation strategy and its use of secondary storage. Sect. 5 concludes the
∗Corresponding author; pednault@us.ibm.com
1
ar
X
iv
:1
91
0.
09
53
4v
2 
 [q
ua
nt-
ph
]  
22
 O
ct 
20
19
paper by estimating the time required by the proposed simulations, and discussing the methodology to
compute such estimates. The Appendix contains a detailed listing of the operations performed by the
proposed simulation strategy.
2 Brief overview of tensor contraction deferral
The simulation algorithm that we propose is based on the idea of partitioning a quantum circuit into
subcircuits that can be simulated independently, at the expense of extra bookkeeping to account for
entanglement between subcircuits. We ensure that the final results are correct by appropriately recom-
bining the different subcircuits and, in some sense, “resolving” the entanglement. Rather than insisting
that all subcircuits reside in primary storage, i.e., RAM, we allow for storing the results of some of the
calculations on secondary storage, e.g., disk. This is particularly effective when combined with slic-
ing techniques, which further partition the quantum state by iteratively fixing the value of some of the
indices in the tensor network.
We now give a brief overview of the main components of our simulation strategies; for details and
several examples, we refer the reader to our earlier work [14]. A tensor is a multilinear map with a set of
indices to address its elements. In the context of this paper, each index takes value in {0, 1}. As discussed
in [14], a tensor network is a hypergraph G = (V,E) such that each node is associated with a tensor
and each hyperedge with an index of the adjacent tensors. Hyperedges between nodes represent shared
indices that must be summed over. A summation over shared indices is called a contraction. A tensor
Ai1,...,im,j1,...,jm is diagonal if it is nonzero only if ik = jk for k = 1, . . . ,m. A tensor is separable if
it can be obtained from a diagonal tensor with a permutation, i.e., there exist functions f1, . . . , fm such
that Af1(j1,...,jm),...,fm(j1,...,jm),j1,...,jm is diagonal. Our hypergraph representation is designed to take
advantage of separable tensors, since the computational resources necessary to perform a contraction
between several tensors can be significantly reduced in the presence of indices shared among multiple
tensors — represented by hyperedges. Given a quantum circuit, we can construct a tensor network by
letting the gates correspond to tensors, and the qubit lines roughly correspond to the indices (i.e. edges
and hyperedges). To simulate large circuits, we rely extensively on contraction deferral and tensor
slicing, which we describe more fully below.
Contraction deferral is a technique first introduced in [1]. Its use in large-scale simulations was
pioneered in [14]. Contraction deferral is defined as the contraction of arbitrary sets of (potentially
non-adjacent) tensors in the tensor network; this is in contrast with the adjacent contraction discussed
in the seminal work [12] and in several subsequent papers, e.g., [3]. A deferred contraction performs
the usual summation over shared indices (i.e., edges interior to the set being contracted), and applies an
outer product to the non-shared indices. As it is a generalization of the traditional adjacent contraction,
contraction deferral opens up new simulation strategies that can lead to reduced memory requirements.
In particular, we partition the tensor network into sub-hypergraphs corresponding to subcircuits, each
of which includes fewer qubits than the initial circuit. Within each circuit we perform computations
following the so-called “Schro¨dinger approach” [1], i.e., evolving the full quantum state of the subcircuit
by applying layers of gates one at a time. To do so, we must use contraction deferral whenever we apply
tensors corresponding to entangling gates between different subcircuits.
Tensor slicing is the idea of iterating over several instances of a circuit in which certain hyperedges
(i.e., tensor indices) are fixed to one of their possible values. While this does not necessarily reduce the
number of operations to be performed, it allows reordering the computations so that they take place in
ways that are potentially more efficient. This is particularly crucial when using secondary storage, which
is slower than primary storage and must therefore be used sparingly; with tensor slicing, we reorganize
the calculations so that only a few selected slices (rather than full tensors) reside in primary storage at
any given time. Our scheme extends the simulation strategy in [9]: we choose a set of indices, slice
2
Figure 1: Gate pattern for a 20-cycle, 53-qubit, Sycamore ABCDCDAB circuit. Single-qubit gates are
merged into their neighboring two-qubit gates, and the two-qubit gates in each cycle are partitioned into
two layers for illustration purposes to make the individual gates easy to identify. These transformations
result in the 40-layer circuit depicted. Dots and shading are used to identify which pairs of qubits are
being operated upon.
them by looping over every possible combination of their values, and use a superset of those qubits to
efficiently organize and address information located in secondary storage.
3 Sycamore circuits
A recent paper [15] describes a new class of random quantum circuits consisting of alternating layers of
single-qubit gates and two-qubit gates. The combination of a layer of single-qubits gates followed by a
layer of two-qubit gates is referred to in [15] as a cycle. Our paper discusses the classical simulation of
circuits of this class; hence, we describe them here in more detail. Gates are applied to all qubits in each
single-qubit gate layer, and to almost all qubits in each two-qubit gate layer. The two-qubit gates are all
non-diagonal, non-separable and their unitary representation varies as a function of both the location of
the gate within the qubit layout, and the depth at which the gate is applied in the circuit. The former
reflects variations in gate tuning, while the latter reflects variations in pulse synchronization over time.
Circuits consist of several cycles of single-qubit gates followed by two-qubit gates, together with a final
layer of single-qubit gates. The single-qubit gates are randomly selected from the set {√X,√Y ,√W}.
3
Figure 2: Gate pattern for a 20-cycle, 54-qubit, Sycamore ABCDCDAB circuit. Single-qubit gates are
merged into their neighboring two-qubit gates, and the two-qubit gates in each cycle are partitioned into
two layers for illustration purposes to make the individual gates easy to identify. These transformations
result in the 40-layer circuit depicted. Dots and shading are used to identify which pairs of qubits are
being operated upon.
4
The two-qubit gates implement the following unitary:
1 0 0 0
0 ei(∆++∆−) cos θ −iei(∆+−∆−,off) sin θ 0
0 −iei(∆++∆−,off) sin θ ei(∆+−∆−) cos θ 0
0 0 0 ei(2∆+−φ)
 , (1)
where θ and φ are nominally 90◦ and 30◦, respectively, and where ∆+, ∆−, and ∆−,off are detuning
terms. For simulation purposes, all single-qubit gates can be aggregated with neighboring two-qubit
gates, yielding an equivalent circuit consisting of (potentially unique) two-qubit gates only. The method
for selecting single-qubit gates ensures that these two-qubit unitary operations are also randomized. The
difficulty of simulation is therefore determined entirely by the pattern of two-qubit gates in the circuit.
Figs. 1 and 2 illustrate the “ABCDCDAB” patterns of two-qubit gates used in the 53- and 54-qubit
random circuits described in [15] to test the Sycamore quantum device. This pattern is intentionally
devised to make the resulting circuits difficult to simulate classically. For illustration purposes, each
cycle of two-qubit gates is depicted as two layers in the figures, so that we can unambiguously indicate
the pairs of qubits involved in each gate operation (i.e., vertical pairs versus horizontal pairs). Thus, the
first two layers of two-qubit gates illustrated correspond to the “A” cycle, the next two layers to the “B”
cycle, and so on. In this representation, the first row corresponds to an “ABCD” sequence of cycles, the
second row corresponds to a “CDAB” sequence of cycles, and the five rows illustrated in Figs. 1 and 2
correspond to the 20-cycle circuits generated according to the ABCDCDAB rules described in [15].
4 Proposed simulation strategy
In [9] the authors suggest that solid-state disk, or more generally secondary storage, could be used to
supplement main memory in order to simulate circuits whose quantum states are too large to store in
main memory alone. In [14], we combined our in-memory methods with those of [9], describing a
viable computation scheme that exploits secondary storage to simulate deeper circuits than was thought
possible. We now apply the approach discussed in [14] to 53- and 54-qubit Sycamore circuits, showing
in this section a computation scheme that allows their simulation on an existing supercomputer, Summit.
The cost of such a scheme is discussed in Sect. 5.
The simulation method in [9] can be seen as a tensor slicing approach. Qubits (and the corresponding
tensor indices) are divided into “global” qubits, which are sliced and used to address across processing
nodes, and “local” qubits, corresponding to tensor indices used to address tensor slices stored on each
processing node. In [9], circuits are partitioned so that all gates within a subcircuit can be applied to the
local slice of the quantum state, without communicating quantum state information among processing
nodes. Such zero-communication updates of a local quantum state are possible when all non-diagonal
gates in a subcircuit are applied to local qubits only. They are also possible for a handful of additional
circumstances described in [9]. In effect, circuits are partitioned by selecting different subsets of local
qubits and analyzing which gates can be applied to them without communication. This determines
the subcircuits. During simulation, communication between processing nodes occurs only when the
simulation switches from one subcircuit to another. When a communication phase takes place, the
memory layout of quantum state tensors is reorganized so that different global and local qubits (i.e.,
tensor indices) are selected, according to the subcircuits that have to be simulated in the subsequent
phase.
The in-memory method that we presented in [14] considers circuit partitionings in which the re-
sulting tensors either fit in available aggregate primary memory in their entirety, or their slices can be
computed using available primary memory (using other tensors already computed and stored in primary
5
Figure 3: Partitioning of a 36-cycle, 53-qubit, Sycamore ABCDCDAB circuit to leverage secondary
storage. Numbers and colors are used to indicate regions of gates within the circuit that are grouped
together to form subcircuits, and also to refer to specific subcircuits in the text. Contraction deferral is
applied to the gates labeled “cd.”
6
Figure 4: Partitioning of a 36-cycle, 54-qubit, Sycamore ABCDCDAB circuit to leverage secondary
storage. Numbers and colors are used to indicate regions of gates within the circuit that are grouped
together to form subcircuits, and also to refer to specific subcircuits in the text. Contraction deferral is
applied to the gates labeled “cd.”
7
Figure 5: Qubit numbering scheme and first-level tensor slicing strategy for 53- and 54-qubit Sycamore
circuits.
memory). With this approach, the resulting tensors and/or their slices will generally be larger than the
primary memories of individual processing nodes; this represents a difference between [14] and [9].
As discussed in [14], we combine the zero-communication strategies of [9] with our own tensor
partitioning strategy to leverage secondary storage when quantum states are too large to fit in aggre-
gate primary memory. Because secondary storage is typically orders of magnitude slower than main
memory, the viability of using it depends on the extent to which the number of read/write cycles can
be minimized or overlapped with computation. To this end, we first employ the in-memory methods
of [14], aiming to maximize the number of gates that can be simulated using available aggregate mem-
ory; the resulting quantum state is calculated in slices and written to secondary storage. The partitioning
methods discussed in [9] can then be applied to the remaining gates in the circuit, setting the number
of “local” qubits according to the size of aggregate memory, rather than the memory sizes available on
individual processing nodes. This increases the size of the resulting tensor slices, allowing the applica-
tion of many more gates to the local quantum state before additional secondary storage read/write cycles
are needed. The resulting subcircuits can be further partitioned into sub-subcircuits, using the methods
of [9], to minimize internode communication in the overall calculations. We now provide details about
these partitionings for the specific circuits studied in this paper.
Figs. 3 and 4 illustrate the first level of circuit partitioning for 36-cycle Sycamore ABCDCDAB
circuits with 53 and 54 qubits, respectively. In this first phase, we use the in-memory methods of [14]
to simulate the subcircuits 1 and 2 illustrated in these figures, performing tensor contraction deferral on
the gates labeled “cd.” The outer-most qubits of the resulting tensors are used as “global” qubits, with
the corresponding slices contracted in order to allow the simulation of subcircuit 3, slice by slice. Each
resulting slice for subcircuit 3 is written to disk. Fig. 5 illustrates the “global” qubits that are sliced in
this first phase of simulation. In the case of the 53-qubit circuit, qubits 0–3 and 49–52 are sliced in the
simulation of subcircuit 3; for the 54-qubit circuit, qubits 0–4 and 50–53 are sliced.
In the second phase, qubits 23–30 are sliced for the 53-qubit circuit and qubits 23–31 are sliced for
the 54-qubit circuit. In both cases, the following steps are performed for each slice: the slice is read
from disk, the gates in subcircuit 4 shown in Figs. 3 and 4 for the respective circuits are applied, and the
slice is written back to disk.
This process is repeated for each subsequent subcircuit, with the choice of sliced qubits alternating
between those used for subcircuit 3 and those used for subcircuit 4. Specifically, for subcircuits 5, 7 and
9, in the 53-qubit circuit we slice qubits 0–3 and 49–52, while in the 54-qubit circuit we slice qubits 0–4
and 50–53. For subcircuits 6 and 8, in the 53-qubit circuit we slice qubits 23–30, while in the 54-qubit
circuit we slice qubits 23–31. As with subcircuit 4, slices are read from disk, processed, and then written
back to disk.
8
Figure 6: Partitioning of subcircuit 3 to minimize all-to-all communication for the 20-cycle, 53-qubit
Sycamore circuit shown in Fig. 1.
To ensure that we efficiently transfer data to/from secondary storage, we organize the data on sec-
ondary storage as 216 logical files for the 53-qubit circuit, and 218 logical files for the 54-qubit circuit.
In the case of the 53-qubit circuit, files are indexed by the values of qubits 0–3, 23–30, and 49–52; each
logical file contains 237 complex amplitudes corresponding to qubits 4–22 and 31–48. In the case of
the 54-qubit circuit, files are indexed by the values of qubits 0–4, 23–31, and 50–53; each logical file
contains 236 complex amplitudes corresponding to qubits 5–22 and 32–49. Thus, in the first phase of
simulation for the 53-qubit circuit (i.e., the phase in which tensor 3 is written to disk), we write 256 log-
ical files to secondary storage for each of the 256 values of qubits 0–3 and 49–52 that are being sliced;
these files correspond to the 256 possible values of qubits 23–30. For 54-qubit circuits, 512 logical files
are written to secondary storage for each of the 512 values of qubits 0–4 and 50–53 that are being sliced;
these files correspond to the 512 possible values of qubits 23–31.
In the second phase of simulation of the 53-qubit circuit (i.e., the phase in which subcircuit 4 is
simulated), for each of the 256 values of qubits 23–30 that are being sliced, we read 256 logical files
from secondary storage, corresponding to the 256 possible values of qubits 0–3 and 49–52. Once these
256 files of amplitudes are loaded into memory, we apply the gates in subcircuit 4 and write each updated
slice back to storage. Similarly, for the 54-qubit circuit, for each of the 512 values of qubits 23–31 that
are being sliced, we read 512 logical files from secondary storage, corresponding to the 512 possible
values of qubits 0–4 and 50–53. Updated slices are written back to secondary storage as 512 files of
amplitudes. These access patterns are repeated for each subsequent phase of processing. The above
approach guarantees that individual logical files are always read or written in their entirety, and they are
never read or written multiple times in a single read or write cycle. Access overhead per read/write cycle
is thereby minimized.
The above slicing strategy is designed to minimize the number of disk accesses by maximizing the
number of “local” qubits employed in each disk slice, which is 45 qubits for both the 53- and the 54-
qubit circuit. As discussed earlier, the slicing methodology in [9] is applied recursively to these 45-qubit
9
Figure 7: Partitioning of subcircuit 4 to minimize all-to-all communication for the 20-cycle, 53-qubit
Sycamore circuit shown in Fig. 1.
slices, to minimize the number of all-to-all communication steps that must be performed in order to
simulate subcircuits 3–9, shown in Figs. 3 and 4. Figs. 6–11 illustrate these recursive partitionings for
tensors 3, 4, and 5 in the case of 20-cycle circuits. These partitionings correspond to slicing an additional
13 qubits (i.e., in addition to the qubits sliced for disk access purposes), in order to distribute work across
4096 nodes and across each pair of IBM Power 9 sockets within those nodes. We employ a socket-level
slicing strategy to enable each socket to work independently, and to avoid Non-Uniform Memory Access
(NUMA) overhead when memory accesses cross socket boundaries.
As shown in Figs. 6 and 9, subcircuit 3 is recursively partitioned into three sub-subcircuits, labeled
3, 4, and 5. For the 53-qubit circuit, the sub-subcircuit labeled 3 is sliced on qubits 4–10 and 43–48,
and for the 54-qubit circuits we slice qubits 5–10 and 43–49. These specific qubits are selected so
that the corresponding slices of tensors 1 and 2, which are small, can be pre-distributed across sockets;
this way the contractions needed to start simulating subcircuit 3 can be performed in-place without
communication. For 53-qubit circuits, the sub-subcircuit labeled 4 is sliced on qubits 4–16, and, for 54-
qubit circuits, on qubits 5–17. This redistribution requires an all-to-all exchange of amplitudes across
sockets. For 53-qubit circuits, the sub-subcircuit labeled 5 is sliced on qubits 36–48, and, for 54-qubit
circuits, on qubits 37–49, again requiring all-to-all communication.
As shown in Figs. 7 and 10, subcircuit 4 is recursively partitioned into two sub-subcircuits labeled
4 and 5. For 53-qubit circuits, the sub-subcircuit labeled 4 is sliced on qubits 40–52, and for 54-qubit
circuits on qubits 41–53. The sub-subcircuit labeled 5 is sliced on qubits 0–12 for both 53- and 54-qubit
circuits.
As shown in Figs. 8 and 11, subcircuit 5 is recursively partitioned into two sub-subcircuits labeled
5 and 6. For 53-qubit circuits, the sub-subcircuit labeled 5 is sliced on qubits 36–48, and, for 54-qubit
circuits, on qubits 37–49. The sub-subcircuit labeled 6 is sliced on qubits 4–16 for 53-qubit circuits, and
on qubits 5–17 for 54-qubit circuits.
10
Figure 8: Partitioning of subcircuit 5 to minimize all-to-all communication for the 20-cycle, 53-qubit
Sycamore circuit shown in Fig. 1.
5 Estimated running times
We estimate running times for the above simulation strategy on the Summit supercomputer using a
combination of published performance figures and early IBM internal benchmarks.
Because we directly employ the partitioning strategy of [9] in a recursive fashion, and the resulting
45-qubit disk slices coincide with the 45-qubit circuits simulated in [9], we use the performance figures
in [9] to estimate per-disk-slice computational costs. This implicitly assumes that we are directly using
the implementation described in [9] for the computations. Since [9] employs 8,192 nodes of the Cori
II supercomputer, we can more easily extrapolate predicted performance across a corresponding 8,192
sockets on Summit.
To account for the differences between the gate set of [9] and Sycamore circuits, we use the following
two facts: in [9], gates are aggregated together into k-qubit kernels represented by 2k × 2k unitary
matrices; and amplitudes are updated using matrix-matrix and/or matrix-vector calculations. The gate
aggregation effectively normalizes computations across gate sets, making them independent from the
details of individual gates. For simulations performed on Cori II, gate aggregation in [9] uses kmax = 5,
with actual kernel sizes sometimes being less than 5 qubits depending on the aggregated gates. To
leverage the performance figures reported in [9], we therefore perform the same form of gate aggregation
on Sycamore circuits. Tables 1 and 2 summarize the results obtained with these gate aggregations for the
53- and 54-qubit circuits illustrated in Figs. 1 and 2, respectively. In these tables, the “5-qubit kernels per
disk slice” column identifies the number of aggregate gates constructed for the corresponding subcircuit
or sub-subcircuit.
After gate aggregation, we estimate execution times for gate operations using Tables 1 and 2 in [9].
Specifically, we use the “Time” and “Comm.” columns of Table 2 in [9] to estimate computation time
from total execution time, by factoring out the reported percentage of communication and synchroniza-
tion time. We then divide the computation times by the number of aggregate gates (i.e., clusters) listed
11
Disk All-to- 5Q Tensor Contrac-
trasfers alls kernels ranks tion Compute % of
per disk per disk per disk per Num cost tot. time total Achieved
Tensor slice slice slice socket gates FLOPs (days) time PFLOPS
1 0.000977 28 28 84 0.002082 0.08% 0.0308
2 0.000977 25 27 84 0.001859 0.07% 0.0173
Contraction 31 1.181·1021 0.117058 4.59% 116.7304
3.3 16 32 63 0.010658 0.42% 18.4865
3.4 1 6 32 23 0.003997 0.16% 17.9975
3.5 1 8 32 26 0.005329 0.21% 15.2587
Disk write 1 1
Disk read 1 1
4.4 11 32 49 0.007327 0.29% 20.9141
4.5 1 10 32 45 0.006661 0.26% 21.1275
Disk write 1 1
Disk read 1 1
5.5 9 32 35 0.005995 0.24% 18.2583
5.6 1 7 32 21 0.004663 0.18% 14.0850
Disk write 1 1
Subtotals
Compute 120 1.181·1021 0.165631 6.50% 87.4462
All-to-alls 9.001953 0.487725 19.13%
Disk I/O 5 1.896296 74.37%
Total 5 9.001953 120 32.67243 430 2.549652 100.00% 87.4462
Table 1: Running time estimates to simulate the 20-cycle, 53-qubit Sycamore circuit shown in Fig. 1.
Tensors 3.3, 3.4, and 3.5 correspond to the partitionings of subcircuit 3 shown in Fig. 6, tensors 4.4 and
4.5 to the partitionings of subcircuit 4 shown in Fig. 7, and tensors 5.5 and 5.6 to the partitionings of
subcircuit 5 shown in Fig. 8. The number of 5-qubit kernels is the number of aggregated gates spanning
no more than 5 qubits, created by grouping gates together within each subcircuit. The contraction cost
is the total number floating-point operations needed to perform the tensor contractions associated with
entanglement indices, when tensors 1 and 2 are contracted in preparation for simulating subcircuit 3.
The tensor ranks per socket indicate the sizes of the corresponding tensors in terms of the effective
number of local qubits per socket. For entries above the subtotal line, the compute times are either the
estimated times to perform gate operations based on the number of 5-qubit kernels, or the estimated
time to perform the tensor 1 and 2 contractions, depending on the row in the table. Entries below the
subtotal line factor in the costs of performing all-to-all communication and disk I/O. In the case of gate
operations, the achieved PetaFLOPs per second column is the total number of floating-point operations
without gate aggregation divided by the estimated compute time with gate aggregation.
12
Disk All-to- 5Q Tensor Contrac-
trasfers alls kernels ranks tion Compute % of
per disk per disk per disk per Num cost tot. time total Achieved
Tensor slice slice slice socket gates FLOPs (days) time PFLOPS
1 0.001953 28 30 84 0.004164 0.07% 0.0616
2 0.001953 26 30 87 0.003867 0.07% 0.0687
Contraction 33 9.445·1021 0.936466 16.14% 116.7304
3.3 15 32 59 0.019984 0.42% 18.4865
3.4 1 8 32 31 0.010658 0.18% 18.1931
3.5 1 8 32 27 0.010658 0.18% 15.8456
Disk write 1 1
Disk read 1 1
4.4 11 32 49 0.014655 0.25% 20.9141
4.5 1 10 32 45 0.013323 0.23% 21.1275
Disk write 1 1
Disk read 1 1
5.5 9 32 37 0.011990 0.21% 19.3016
5.6 1 7 32 21 0.009326 0.16% 14.0850
Disk write 1 1
Subtotals
Compute 122 9.445·1021 1.035091 17.84% 107.2342
All-to-alls 9.003906 0.975661 16.81%
Disk I/O 5 3.792593 65.35%
Total 5 9.003906 122 33.80735 440 5.803345 100.00% 107.2342
Table 2: Running time estimates to simulate the 20-cycle, 54-qubit Sycamore circuit shown in Fig. 2.
Tensors 3.3, 3.4, and 3.5 correspond to the partitionings of subcircuit 3 shown in Fig. 9, tensors 4.4 and
4.5 to the partitionings of subcircuit 4 shown in Fig. 10, and tensors 5.5 and 5.6 to the partitionings of
subcircuit 5 shown in Fig. 11. The number of 5-qubit kernels is the number of aggregated gates spanning
no more than 5 qubits, created by grouping gates together within each subcircuit. The contraction cost
is the total number floating-point operations needed to perform the tensor contractions associated with
entanglement indices, when tensors 1 and 2 are contracted in preparation for simulating subcircuit 3.
The tensor ranks per socket indicate the sizes of the corresponding tensors in terms of the effective
number of local qubits per socket. For entries above the subtotal line, the compute times are either the
estimated times to perform gate operations based on the number of 5-qubit kernels, or the estimated
time to perform the tensor 1 and 2 contractions, depending on the row in the table. Entries below the
subtotal linefactor in the costs of performing all-to-all communication and disk I/O. In the case of gate
operations, the achieved PetaFLOPs per second column is the total number of floating-point operations
without gate aggregation divided by the estimated compute time with gate aggregation.
13
Figure 9: Partitioning of subcircuit 3 to minimize all-to-all communication for the 20-cycle, 54-qubit
Sycamore circuit shown in Fig. 2.
in Table 1 in [9] to obtain overall average execution times per aggregate gate. These averages can be
used to estimate execution times for arbitrary numbers of aggregate gates, assuming simulations are
performed on Cori II. To obtain corresponding time estimates for Summit, we scale the estimates by the
ratio of the High Performance Linpack (HPL) benchmark figure for Cori II (14,014.70 TeraFLOPs/sec)
versus Summit (148,600.00 TeraFLOPs/sec); this accounts for the substantially greater performance of
Summit when performing, e.g., the matrix-vector calculations entailed by the use of gate aggregation.
The calculations for 45-qubit simulations yield an expected execution time of 2.38380 seconds per ag-
gregate gate on Cori II, and 0.22482 seconds per aggregated gate on Summit. The runtime estimates
shown in Tabs. 1 and 2 for tensors 3, 4 and 5 are obtained by multiplying the number of aggregate gates
in each of their sub-subcircuits by 0.22482 seconds, and further multiplying by the number of disk slices
per tensor (i.e., 256 slices for the 53-qubit circuit and 512 slices for the 54-qubit circuit). Tensors 1 and
2, on the other hand, represent 30-qubit calculations or less; therefore, we use the 30-qubit performance
figures from Tables 1 and 2 in [9] to obtain an estimated 0.025097 seconds per aggregate gate on Summit
for these tensors.
The “achieved FLOPs per second” columns in Tabs. 1 and 2 provide a sanity check for the above
time estimates. For rows corresponding to gate operations, this column reports the number of floating-
point operations that would be performed without gate aggregation divided by the estimated execution
times. As such, the resulting figures provide an indication of the implied efficiency of the time estimates.
As can be seen, the time estimates for gate operations yield results that are all near or below 11% of
the 191 PetaFLOPs/sec peak double-precision performance expected across 8,192 sockets. Therefore,
there is room to potentially improve upon these estimates by leveraging the capabilities of Summit’s
NVIDA GPUs: this would allow the simulation of individual gate operations, without resorting to gate
aggregation, and the use of cuBLAS routines to implement the corresponding matrix-vector operations.
To obtain time estimates for the contractions of tensors 1 and 2, we use the performance figures
reported in Table 1 in [17]. The simulation method presented in [16, 17] employs the “bristle-brush”
14
Figure 10: Partitioning of subcircuit 4 to minimize all-to-all communication for the 20-cycle, 54-qubit
Sycamore circuit shown in Fig. 2.
strategy outlined in [13]. A key characteristic of this simulation strategy is that computations are dom-
inated by very large tensor contractions across many tensor indices simultaneously. Consequently, we
directly use the performance figures reported in Table 1 in [17] to estimate contraction times. Because we
assume double-precision calculations, as in [9], we convert the performance figures in [17] from single-
precision to double-precision. To do so, we multiply the single-precision computation rates of [17] by
the ratio between the double-precision (7.8 TeraFLOPs/sec) and single-precision (15.7 TeraFLOPs/sec)
peak performance rate of Summit’s NVIDIA GPUs. Performing this calculation and taking the worst
case yields as estimated 14.249 TeraFLOPs per socket (116.73 PetaFLOPs/sec for 8,192 sockets). We
use this rate in Tabs. 1 and 2 to estimate execution times for the contraction of tensors 1 and 2.
We estimate all-to-all and disk I/O times using results from early IBM internal benchmarks. These
benchmarks indicate that a network injection rate of 7 GB/sec per node (3.5 GB/sec per socket) should
be easily achieved during an all-to-all across the entire machine. This figure represents ≈ 30% of the 23
GB/sec per node peak injection rate (11.5 GB/sec per socket) that characterizes the bisection bandwidth
of Summit. The maximum reported file-system transfer rate is 2.2 TB/sec for random-access I/O (2.5
TB/sec for pure sequential I/O). We assume that all disk storage operations use single precision, while
in-memory calculations use double precision. Thus, the estimates in Tabs. 1 and 2 are based on a
transfer rate of 2 TB/sec and a storage density of 8 bytes per amplitude (i.e., single-precision complex).
Benchmark tests suggest that allocating only a subset of nodes to the task of performing disk I/O can be
more efficient because it may avoid contention; those nodes then become the distribution points to the
rest of the system when spreading computations across a majority of the nodes. Tabs. 1 and 2 model this
arrangement by incorporating an all-to-all communication cost for every disk read or write operation.
The resulting estimated running times are summarized in Tabs. 1 and 2. As these tables show, with
the performance model discussed in this section we obtain an overall estimate of 2.55 days to compute
all 253 amplitudes of a 20-cycle, 53-qubit, Sycamore ABCDCDAB circuit with all amplitudes stored
on disk, and 5.80 days for the corresponding 54-qubit circuit. To store amplitudes on disk in single
15
Figure 11: Partitioning of subcircuit 5 to minimize all-to-all communication for the 20-cycle, 54-qubit
Sycamore circuit shown in Fig. 2.
precision, 64 PiB of disk space are required for 53-qubit circuits, and 128 PiB for 54-qubit circuits.
Both fit within the 250 PiB available on Summit.
The above analysis can be repeated for all depths suggested by Figs. 3 and 4; i.e., 10, 14, 20, 24,
28, 32, and 36 cycles. Doing so yields the results reported in Tables 3 and 4, and plotted in Fig. 12. As
these tables and figure illustrate, estimated execution times grow linearly with the depth of the circuits.
We remark that the required disk space remains constant, because with the above approach there is a
maximum number of slices that are stored on disk at any given time. Thus, the disk occupation is 64 PiB
for 53-qubit circuits and 128 PiB for 54-qubit circuits, regardless of the number of cycles.
Number Disk Xfers All-to-Alls 5-Qubit
of per Disk per Disk Kernels per Run Time
Cycles Slice Slice Disk Slice (days)
10 1 3.002 65 0.67
14 3 6.002 89 1.61
20 5 9.002 120 2.55
24 7 13.002 141 3.54
28 9 16.002 162 4.47
32 11 20.002 182 5.46
36 13 24.002 206 6.45
Table 3: Estimates of total run times for simulating 53-qubit, Sycamore ABCDCDAB circuits of various
depths.
16
Number Disk Xfers All-to-Alls 5-Qubit
of per Disk per Disk Kernels per Run Time
Cycles Slice Slice Disk Slice (days)
10 1 3.004 66 2.05
14 3 6.004 90 3.92
20 5 9.004 122 5.80
24 7 13.004 144 7.78
28 9 16.004 166 9.65
32 11 20.004 187 11.63
36 13 24.004 211 13.62
Table 4: Estimates of total run times for simulating 54-qubit, Sycamore ABCDCDAB circuits of various
depths.
Figure 12: Graph of total runtime estimates for fully simulating both 53- and 54-qubit, Sycamore ABCD-
CDAB circuits of various depths, with all amplitudes calculated and stored on disk.
17
References
[1] S. Aaronson and L. Chen. Complexity-theoretic foundations of quantum supremacy experiments.
arXiv preprint arXiv:1612.05903, 2016.
[2] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush, N. Ding, Z. Jiang, M. J. Bremnen, J. M.
Martinis, and H. Neven. Characterizing quantum supremacy in near-term devices. Nature Physics,
14(6):595–600, 2018.
[3] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, and H. Neven. Simulation of low-depth quantum circuits
as complex undirected graphical models. arXiv preprint arXiv:1712.05384, 2017.
[4] D. Castelvecchi. Quantum computers ready to leap out of the lab in 2017. Nature, 541:9–10, 2017.
[5] J. Chen, F. Zhang, M. Chen, C. Huang, M. Newman, and Y. Shi. Classical simulation of
intermediate-size quantum circuits. arXiv preprint arXiv:1805.01450, 2018.
[6] M.-C. Chen, R. Li, L. Gan, X. Zhu, G. Yang, C.-Y. Lu, and J.-W. Pan. Quantum teleportation-
inspired algorithm for sampling large random quantum circuits. arXiv preprint arXiv:1901.05003,
2019.
[7] Z. Chen, Q. Zhou, C. Xue, X. Yang, G. Guo, and G. Guo. 64-qubit quantum circuit simulation.
arXiv preprint arXiv:1802.06952, 2018.
[8] C. Guo, Y. Liu, M. Xiong, S. Xue, X. Fu, A. Huang, X. Qiang, P. Xu, J. Liu, S. Zheng, H.-L. Huang,
M. Deng, D. Poletti, W.-S. Bao, and J. Wu. General-purpose quantum circuit simulator with pro-
jected entangled-pair states and the quantum supremacy frontier. arXiv preprint arXiv:1905.08394,
2019.
[9] T. Ha¨ner and D. S. Steiger. 0.5 petabyte simulation of a 45-qubit quantum circuit. In Proceedings of
the International Conference for High Performance Computing, Networking, Storage and Analysis,
SC ’17, pages 33:1–33:10, New York, NY, USA, 2017. ACM.
[10] R. Li, B. Wu, M. Ying, X. Sun, and G. Yang. Quantum supremacy circuit simulation on sunway
taihulight. arXiv preprint arXiv:1804.04797, 2018.
[11] I. L. Markov, A. Fatima, S. V. Isakov, and S. Boixo. Quantum supremacy is both closer and farther
than it appears. arXiv preprint arXiv:1807.10749, 2018.
[12] I. L. Markov and Y. Shi. Simulating quantum computation by contracting tensor networks. SIAM
Journal on Computing, 38(3):963–981, 2008.
[13] E. Pednault. Quantum computing—breaking through the 49 qubit simulation barrier. IBM Re-
search Blog posting also posted on phys.org (https://www.ibm.com/blogs/research/
2017/10/quantum-computing-barrier/ and https://phys.org/news/
2017-10-quantum-computingbreaking-qubit-simulation-barrier.html),
2017.
[14] E. Pednault, J. A. Gunnels, G. Nannicini, L. Horesh, T. Magerlein, E. Solomonik, and R. Wis-
nieff. Breaking the 49-qubit barrier in the simulation of quantum circuits. arXiv preprint
arXiv:1710.05867, 2017.
[15] E. G. Rieffel and al. Quantum supremacy using a programmable superconducting processor. NASA
AMES Research Center Technical Report NASA/TP-2019-220319, 2019.
18
[16] B. Villalonga, S. Boixo, B. Nelson, C. Henze, E. Rieffel, R. Biswas, and S. Mandra`. A flexi-
ble high-performance simulator for the verification and benchmarking of quantum circuits imple-
mented on real hardware. arXiv preprint arXiv:1811.09599, 2018.
[17] B. Villalonga, D. Lyakh, S. Boixo, H. Neven, T. S. Humble, R. Biswas, E. G. Rieffel, A. Ho, and
S. Mandra`. Establishing the quantum supremacy frontier with a 281 pflop/s simulation. arXiv
preprint arXiv:1905.00444, 2019.
[18] F. Zhang, C. Huang, M. Newman, J. Cai, H. Yu, Z. Tian, B. Yuan, H. Xu, J. Wu, X. Gao, J. Chen,
M. Szegedy, and Y. Shi. Alibaba cloud quantum development kit: Large-scale classical simulation
of quantum circuits. arXiv preprint arXiv:1907.11217, 2019.
19
A Implementation details
We give a full description of the implementation of the tensor computations, including cache blocking
to create the 5-qubit aggregate gates. Further optimizations might be possible; this description is meant
as a viable proof of concept modeled on [9].
We describe how to interpret each line in the listing. Qubits are numbered from 0 to n − 1. The
“Mode” column indicates the type of information contained in the corresponding line. We give an
overview of each of the eight possible modes.
• “define”: the first define line indicates the total number of qubits, the second define line describes
the qubit indices used to address logical files on disk.
• “new”:
– if the “Gate” column contains “tensor”, it describes a new tensor. The column “Arguments”
indicates the corresponding number of local qubits and of global qubits, followed by the
indices of the qubits in the tensor, starting with local qubits and ending with global qubits.
– if the “Gate” column contains “cache”, it indicates how to partition gates into 5-qubit aggre-
gate gates.
• “gate”: describes a gate. All gates are two-qubit gates after circuit transformations, as discussed
in the paper; the “Arguments” column indicates the qubits involved.
• “entgl”: describes the introduction of an entanglement index due to a deferred contraction between
tensors.
– if the “Gate” column contains “tensor”, it describes the new tensor with the corresponding
list of entanglement indices (labeled with negative numbers).
– if the “Gate” column contains “EI” or “E2Q”, it describes synthesized gate operations that
are employed when introducing entanglement indices.
• “slice”: lists the indices of the qubits that are sliced on the first level of the recursive scheme.
• “all2all”: indicates a communication between nodes used to rearrange tensor indices in prepara-
tion for a contraction, or to swap which qubits are local and which ones are global.
• “write”: indicates which qubit indices to fix, in order to write a slice to disk.
• “read”: indicates which qubit indices to fix, in order to read a slice from disk.
Note that in the listings, gate 2Q together with the level of the gate in the circuit and the qubits to
which that gate is applied refers to a specific instance of one of the gates shown in Figs. 1 and 2 with its
own potentially unique associated unitary matrix.
To define the synthesized gate operationsEI andE2Q , suppose one of these 2Q gate bridges qubits
a and b in Tensors 1 and 2, respectively. Let φa and χb be the corresponding tensors prior to applying
that 2Q gate. Then the resulting quantum state is given by
ψa′′,b′′ =
∑
a,b
2Qa′′,b′′,a,b · φa · χb =
∑
a′a,b
Ia′′,a′ · 2Qa′,b′′,a,b · φa · χb (2)
where I is the identity matrix. The above equation can be rewritten as
ψa′′,b′′ =
∑
a′,a
φ′a′′,a′,a · χ′b′′,a′,a (3)
20
where
φ′a′′,a′,a = Ia′′,a′ · φa , χ′b′′,a′,a =
∑
b
2Qa′,b′′,a,b · χb (4)
These last two equations define the EI and E2Q synthesized gate operations, respectively, where a′′ is
effectively the new index for qubit a, b′′ the new index for qubit b, and a′ and a are entanglement indices
introduced through these synthesized gate operations. The equation above it defines the contraction
performed to eliminate entanglement indices.
The following listing is for the 53-qubit circuit.
Tensor Phase Mode Depth Gate Arguments
0 0 define 0 qubits 53
0 0 define 0 disk 0,1,2,3,4,5,6,7,8,23,24,25,26,27,28
,29,30,49,50,51,52
1 1 new 0 tensor 27,0,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26
1 1 new 0 cache 10,14,18,19,23
1 1 gate 1 2Q 14,10
1 1 gate 1 2Q 23,19
1 1 gate 3 2Q 18,14
1 1 gate 6 2Q 23,18
1 1 new 0 cache 2,6,10,18,23
1 1 gate 2 2Q 6,2
1 1 gate 4 2Q 10,6
1 1 new 0 cache 1,5,6,18,23
1 1 gate 2 2Q 5,1
1 1 gate 6 2Q 6,1
1 1 new 0 cache 5,9,14,18,23
1 1 gate 3 2Q 9,5
1 1 gate 6 2Q 14,9
1 1 new 0 cache 0,5,18,23
1 1 gate 5 2Q 5,0
1 1 new 0 cache 18,22,23,26
1 1 gate 2 2Q 26,22
1 1 new 0 cache 18,20,23,24,26
1 1 gate 1 2Q 24,20
1 1 new 0 cache 12,16,18,20,23
1 1 gate 2 2Q 16,12
1 1 gate 4 2Q 20,16
1 1 new 0 cache 18,20,21,23,25
1 1 gate 2 2Q 25,21
1 2 entgl 0 tensor 27,8,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14
1 2 entgl 3 EI 25,-1,-2
1 2 entgl 3 EI 24,-3,-4
1 2 entgl 4 EI 26,-5,-6
1 2 entgl 7 EI 23,-9,-10
1 2 new 0 cache 18,20,23,25
1 2 gate 5 2Q 25,20
1 2 gate 10 2Q 23,18
1 2 new 0 cache 11,14,15,19,24
1 2 gate 2 2Q 15,11
1 2 gate 3 2Q 19,15
1 2 gate 5 2Q 24,19
1 2 gate 8 2Q 19,14
1 2 new 0 cache 9,14,19,24,25
1 2 gate 10 2Q 14,9
1 3 entgl 0 tensor 27,12,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14
1 3 entgl 7 EI 24,-7,-8
1 3 entgl 8 EI 25,-13,-14
1 3 new 0 cache 10,14,15,19,24
1 3 gate 6 2Q 15,10
21
Tensor Phase Mode Depth Gate Arguments
1 3 gate 9 2Q 24,19
1 3 gate 12 2Q 19,14
1 3 new 0 cache 0,5,10,15,20
1 3 gate 7 2Q 20,15
1 3 gate 8 2Q 10,5
1 3 gate 9 2Q 5,0
1 3 gate 10 2Q 15,10
1 3 gate 12 2Q 10,5
1 3 new 0 cache 10,14,18,20,25
1 3 gate 9 2Q 25,20
1 3 gate 13 2Q 14,10
1 3 gate 15 2Q 18,14
1 3 new 0 cache 13,15,17,20,21
1 3 gate 1 2Q 17,13
1 3 gate 4 2Q 21,17
1 3 gate 11 2Q 20,15
1 3 new 0 cache 3,7,11,21,26
1 3 gate 1 2Q 7,3
1 3 gate 4 2Q 11,7
1 3 gate 6 2Q 26,21
1 3 new 0 cache 1,5,6,11,16
1 3 gate 5 2Q 16,11
1 3 gate 8 2Q 11,6
1 3 gate 10 2Q 6,1
1 3 gate 14 2Q 5,1
1 3 new 0 cache 0,1,5,9,26
1 3 gate 15 2Q 9,5
1 3 gate 18 2Q 5,1
1 3 gate 19 2Q 9,5
1 3 gate 21 2Q 5,0
1 3 new 0 cache 6,11,15,16,21
1 3 gate 7 2Q 21,16
1 3 gate 9 2Q 16,11
1 3 gate 12 2Q 11,6
1 3 gate 14 2Q 15,11
1 3 new 0 cache 3,4,8,12,13
1 3 gate 1 2Q 8,4
1 3 gate 3 2Q 12,8
1 3 gate 5 2Q 8,3
1 3 gate 7 2Q 13,8
1 3 gate 9 2Q 8,3
1 3 gate 11 2Q 13,8
1 3 gate 13 2Q 8,4
1 3 new 0 cache 2,6,7,12,17
1 3 gate 5 2Q 17,12
1 3 gate 6 2Q 7,2
1 3 gate 7 2Q 12,7
1 3 gate 10 2Q 7,2
1 3 gate 14 2Q 6,2
1 3 new 0 cache 6,9,10,14,18
1 3 gate 16 2Q 10,6
1 3 gate 17 2Q 14,10
1 3 gate 19 2Q 18,14
1 3 gate 22 2Q 14,9
1 3 new 0 cache 1,2,6,10,26
1 3 gate 18 2Q 6,2
1 3 gate 20 2Q 10,6
1 3 gate 22 2Q 6,1
1 3 new 0 cache 7,12,13,17,22
1 3 gate 8 2Q 22,17
1 3 gate 9 2Q 17,12
1 3 gate 11 2Q 12,7
1 3 gate 12 2Q 22,17
1 3 gate 13 2Q 17,13
1 3 new 0 cache 3,7,11,21,26
1 3 gate 13 2Q 7,3
1 3 gate 16 2Q 11,7
1 3 gate 17 2Q 7,3
22
Tensor Phase Mode Depth Gate Arguments
1 4 entgl 0 tensor 28,13,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14
1 4 entgl 8 EI 26,-11,-12
1 4 new 0 cache 8,12,16,21,26
1 4 gate 10 2Q 26,21
1 4 gate 11 2Q 21,16
1 4 gate 14 2Q 16,12
1 4 gate 15 2Q 12,8
1 4 new 0 cache 4,8
1 4 gate 17 2Q 8,4
2 5 new 0 tensor 26,0,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52
2 5 new 0 cache 36,40,44,45,49
2 5 gate 1 2Q 49,45
2 5 gate 2 2Q 40,36
2 5 gate 3 2Q 44,40
2 5 gate 6 2Q 49,44
2 5 gate 10 2Q 49,44
2 5 new 0 cache 30,34,37,41,45
2 5 gate 2 2Q 34,30
2 5 gate 2 2Q 41,37
2 5 gate 4 2Q 45,41
2 5 new 0 cache 29,33,34,38,42
2 5 gate 1 2Q 33,29
2 5 gate 1 2Q 42,38
2 5 gate 3 2Q 38,34
2 5 new 0 cache 29,30,34,39,43
2 5 gate 1 2Q 43,39
2 6 entgl 0 tensor 26,4,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,-1,-2
,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14
2 6 entgl 4 E2Q 29,-5,-6
2 6 entgl 8 E2Q 30,-11,-12
2 6 new 0 cache 29,33,34,37,39
2 6 gate 3 2Q 37,33
2 6 gate 6 2Q 34,29
2 6 gate 7 2Q 39,34
2 6 new 0 cache 27,29,31,34,39
2 6 gate 2 2Q 31,27
2 6 new 0 cache 29,31,34,35,39
2 6 gate 4 2Q 35,31
2 6 new 0 cache 35,40,45,46,50
2 6 gate 2 2Q 50,46
2 6 gate 5 2Q 40,35
2 6 gate 6 2Q 50,45
2 6 gate 8 2Q 45,40
2 6 gate 9 2Q 40,35
2 6 gate 10 2Q 50,45
2 6 gate 12 2Q 45,40
2 6 new 0 cache 29,34,39,45,49
2 6 gate 13 2Q 49,45
2 6 new 0 cache 29,34,39,42,46
2 6 gate 4 2Q 46,42
2 6 new 0 cache 29,34,37,39,42
2 6 gate 6 2Q 42,37
2 7 entgl 0 tensor 26,8,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,-1,-2
,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14
2 7 entgl 3 E2Q 27,-3,-4
2 7 entgl 8 E2Q 29,-13,-14
2 7 new 0 cache 28,29,32,34,39
2 7 gate 1 2Q 32,28
2 7 gate 10 2Q 34,29
2 7 gate 11 2Q 39,34
2 7 new 0 cache 27,32,36,37,41
2 7 gate 4 2Q 36,32
2 7 gate 5 2Q 32,27
23
Tensor Phase Mode Depth Gate Arguments
2 7 gate 6 2Q 41,36
2 7 gate 8 2Q 37,32
2 7 new 0 cache 27,41,46,47,51
2 7 gate 2 2Q 51,47
2 7 gate 5 2Q 51,46
2 7 gate 8 2Q 46,41
2 7 gate 9 2Q 51,46
2 7 new 0 cache 31,36,40,41,44
2 7 gate 7 2Q 36,31
2 7 gate 10 2Q 41,36
2 7 gate 11 2Q 36,31
2 7 gate 14 2Q 40,36
2 7 gate 15 2Q 44,40
2 7 new 0 cache 27,32,41,46,50
2 7 gate 12 2Q 46,41
2 7 gate 14 2Q 50,46
2 7 new 0 cache 28,33,38,43,47
2 7 gate 3 2Q 47,43
2 7 gate 5 2Q 43,38
2 7 new 0 cache 37,42,47,48,52
2 7 gate 1 2Q 52,48
2 7 gate 5 2Q 52,47
2 7 gate 7 2Q 47,42
2 7 gate 9 2Q 52,47
2 7 gate 10 2Q 42,37
2 7 gate 11 2Q 47,42
2 8 entgl 0 tensor 26,12,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,-1,-2
,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14
2 8 entgl 3 E2Q 28,-1,-2
2 8 entgl 7 E2Q 27,-9,-10
2 8 new 0 cache 27,32,37,41,45
2 8 gate 9 2Q 32,27
2 8 gate 12 2Q 37,32
2 8 gate 14 2Q 41,37
2 8 gate 16 2Q 45,41
2 8 new 0 cache 45,47,49,51
2 8 gate 14 2Q 51,47
2 8 gate 17 2Q 49,45
2 8 new 0 cache 28,33,38,43,48
2 8 gate 6 2Q 33,28
2 8 gate 7 2Q 48,43
2 8 gate 8 2Q 38,33
2 8 gate 9 2Q 43,38
2 8 gate 11 2Q 48,43
2 8 new 0 cache 28,39,43,47,51
2 8 gate 13 2Q 43,39
2 8 gate 15 2Q 47,43
2 8 gate 17 2Q 43,39
2 8 gate 18 2Q 51,47
2 8 gate 19 2Q 47,43
2 8 new 0 cache 28,33,47,48,52
2 8 gate 13 2Q 52,48
2 8 gate 17 2Q 52,48
2 8 gate 21 2Q 52,47
2 9 entgl 0 tensor 27,13,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,-1,-2
,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14
2 9 entgl 7 E2Q 28,-7,-8
2 9 new 0 cache 28,33,38,42,46
2 9 gate 10 2Q 33,28
2 9 gate 12 2Q 38,33
2 9 gate 13 2Q 42,38
2 9 gate 16 2Q 46,42
2 9 new 0 cache 46,50
2 9 gate 18 2Q 50,46
0 10 slice 0 disk 0,1,2,3,49,50,51,52
1 10 all2all 0 tensor 30,7,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13
24
Tensor Phase Mode Depth Gate Arguments
,-14,11,12,13,14,15,16,17,18,19,20,21,22,23,24
,25,26,4,5,6,7,8,9,10
2 10 all2all 0 tensor 30,6,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13
,-14,27,28,29,30,31,32,33,34,35,36,37,38,39,40
,41,42,43,44,45,46,47,48
3 1 new 0 tensor 32,13,11,12,13,14,15,16,17,18,19,20,21,22,23
,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38
,39,40,41,42,4,5,6,7,8,9,10,43,44,45,46
,47,48
3 1 new 0 cache 15,18,19,23,27
3 1 gate 11 2Q 27,23
3 1 gate 13 2Q 23,19
3 1 gate 15 2Q 19,15
3 1 gate 17 2Q 23,19
3 1 gate 22 2Q 23,18
3 1 new 0 cache 26,30,34,38,42
3 1 gate 12 2Q 30,26
3 1 gate 14 2Q 34,30
3 1 gate 15 2Q 38,34
3 1 gate 17 2Q 42,38
3 1 gate 18 2Q 34,30
3 1 gate 19 2Q 38,34
3 1 new 0 cache 11,15,19,27,31
3 1 gate 14 2Q 31,27
3 1 gate 18 2Q 15,11
3 1 gate 19 2Q 19,15
3 1 new 0 cache 24,28,32,36,40
3 1 gate 11 2Q 28,24
3 1 gate 13 2Q 32,28
3 1 gate 16 2Q 36,32
3 1 gate 18 2Q 40,36
3 1 new 0 cache 20,24,27,31,35
3 1 gate 13 2Q 24,20
3 1 gate 15 2Q 27,24
3 1 gate 16 2Q 35,31
3 1 gate 18 2Q 31,27
3 1 gate 20 2Q 35,31
3 1 new 0 cache 16,19,20,24,27
3 1 gate 16 2Q 20,16
3 1 gate 17 2Q 24,20
3 1 gate 19 2Q 27,24
3 1 gate 21 2Q 24,19
3 1 new 0 cache 12,14,16,19,20
3 1 gate 18 2Q 16,12
3 1 gate 20 2Q 20,16
3 1 gate 24 2Q 19,14
3 1 new 0 cache 25,29,33,37,41
3 1 gate 12 2Q 29,25
3 1 gate 13 2Q 33,29
3 1 gate 15 2Q 37,33
3 1 gate 18 2Q 41,37
3 1 new 0 cache 22,26,29,33,34
3 1 gate 14 2Q 26,22
3 1 gate 16 2Q 29,26
3 1 gate 17 2Q 33,29
3 1 gate 18 2Q 26,22
3 1 gate 20 2Q 29,26
3 1 gate 22 2Q 34,29
3 1 new 0 cache 33,34,37,39
3 1 gate 19 2Q 37,33
3 1 gate 23 2Q 39,34
3 1 new 0 cache 21,25,28,32,36
3 1 gate 14 2Q 25,21
3 1 gate 15 2Q 28,25
3 1 gate 17 2Q 32,28
3 1 gate 20 2Q 36,32
3 1 new 0 cache 18,23,27,32
3 1 gate 21 2Q 32,27
25
Tensor Phase Mode Depth Gate Arguments
3 1 gate 23 2Q 27,23
3 1 gate 26 2Q 23,18
3 1 new 0 cache 17,21,25,28,33
3 1 gate 16 2Q 21,17
3 1 gate 18 2Q 25,21
3 1 gate 19 2Q 28,25
3 1 gate 22 2Q 33,28
3 1 new 0 cache 20,25,29,34,39
3 1 gate 21 2Q 25,20
3 1 gate 24 2Q 29,25
3 1 gate 26 2Q 34,29
3 1 gate 27 2Q 39,34
3 1 new 0 cache 13,17,19,24,28
3 1 gate 17 2Q 17,13
3 1 gate 23 2Q 28,24
3 1 gate 25 2Q 24,19
3 1 new 0 cache 17,21,26,30
3 1 gate 20 2Q 21,17
3 1 gate 22 2Q 26,21
3 1 gate 24 2Q 30,26
3 2 all2all 0 tensor 32,13,17,18,19,20,21,22,23,24,25,26,27,28,29
,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44
,45,46,47,48,4,5,6,7,8,9,10,11,12,13,14
,15,16
3 2 new 0 cache 33,38,39,43,48
3 2 gate 21 2Q 43,38
3 2 gate 23 2Q 48,43
3 2 gate 24 2Q 38,33
3 2 gate 25 2Q 43,38
3 2 gate 27 2Q 48,43
3 2 gate 29 2Q 43,39
3 2 new 0 cache 24,28,33,38
3 2 gate 26 2Q 33,28
3 2 gate 27 2Q 28,24
3 2 gate 28 2Q 38,33
3 2 new 0 cache 32,37,42,46,47
3 2 gate 20 2Q 46,42
3 2 gate 22 2Q 42,37
3 2 gate 23 2Q 47,42
3 2 gate 24 2Q 37,32
3 2 gate 26 2Q 42,37
3 2 new 0 cache 23,27,28,32,37
3 2 gate 25 2Q 32,27
3 2 gate 27 2Q 27,23
3 2 gate 28 2Q 37,32
3 2 gate 29 2Q 32,28
3 2 new 0 cache 31,36,41,45
3 2 gate 20 2Q 45,41
3 2 gate 22 2Q 41,36
3 2 gate 23 2Q 36,31
3 2 new 0 cache 35,40,44
3 2 gate 19 2Q 44,40
3 2 gate 21 2Q 40,35
3 3 all2all 0 tensor 32,13,4,5,6,7,8,9,10,11,12,13,14,15,16
,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46
,47,48
3 3 new 0 cache 9,14,19,23
3 3 gate 26 2Q 14,9
3 3 gate 28 2Q 19,14
3 3 gate 29 2Q 23,19
3 3 new 0 cache 10,15,20,25,29
3 3 gate 22 2Q 15,10
3 3 gate 23 2Q 20,15
3 3 gate 25 2Q 25,20
3 3 gate 28 2Q 29,25
3 3 new 0 cache 5,10,15,29,33
3 3 gate 24 2Q 10,5
26
Tensor Phase Mode Depth Gate Arguments
3 3 gate 26 2Q 15,10
3 3 gate 29 2Q 33,29
3 3 new 0 cache 7,11,15,20,24
3 3 gate 20 2Q 11,7
3 3 gate 27 2Q 20,15
3 3 gate 29 2Q 24,20
3 3 new 0 cache 11,16,21,26,30
3 3 gate 21 2Q 16,11
3 3 gate 23 2Q 21,16
3 3 gate 26 2Q 26,21
3 3 gate 28 2Q 30,26
3 3 new 0 cache 6,11,16,30,34
3 3 gate 24 2Q 11,6
3 3 gate 25 2Q 16,11
3 3 gate 30 2Q 34,30
3 3 new 0 cache 16,21,25,28
3 3 gate 27 2Q 21,16
3 3 gate 30 2Q 25,21
3 3 gate 31 2Q 28,25
3 3 new 0 cache 8,12,17,22
3 3 gate 19 2Q 12,8
3 3 gate 21 2Q 17,12
3 3 gate 24 2Q 22,17
3 3 write 0 disk 0,1,2,3,49,50,51,52
4 1 read 0 disk 23,24,25,26,27,28,29,30
4 1 new 0 tensor 32,13,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,31,32,33,34,35
,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50
,51,52
4 1 new 0 cache 0,5,10,14,18
4 1 gate 25 2Q 5,0
4 1 gate 28 2Q 10,5
4 1 gate 29 2Q 14,10
4 1 gate 31 2Q 18,14
4 1 new 0 cache 0,1,5,6,9
4 1 gate 26 2Q 6,1
4 1 gate 30 2Q 5,1
4 1 gate 31 2Q 9,5
4 1 gate 34 2Q 5,1
4 1 gate 35 2Q 9,5
4 1 gate 37 2Q 5,0
4 1 new 0 cache 6,11,15,19
4 1 gate 28 2Q 11,6
4 1 gate 30 2Q 15,11
4 1 gate 31 2Q 19,15
4 1 new 0 cache 2,6,7,10,12
4 1 gate 22 2Q 7,2
4 1 gate 23 2Q 12,7
4 1 gate 26 2Q 7,2
4 1 gate 30 2Q 6,2
4 1 gate 32 2Q 10,6
4 1 gate 34 2Q 6,2
4 1 new 0 cache 6,9,10,14,18
4 1 gate 33 2Q 14,10
4 1 gate 35 2Q 18,14
4 1 gate 36 2Q 10,6
4 1 gate 38 2Q 14,9
4 1 new 0 cache 1,6,12,17,22
4 1 gate 25 2Q 17,12
4 1 gate 28 2Q 22,17
4 1 gate 38 2Q 6,1
4 1 new 0 cache 3,8,13,17,21
4 1 gate 21 2Q 8,3
4 1 gate 23 2Q 13,8
4 1 gate 25 2Q 8,3
4 1 gate 27 2Q 13,8
4 1 gate 29 2Q 17,13
4 1 gate 32 2Q 21,17
27
Tensor Phase Mode Depth Gate Arguments
4 1 gate 33 2Q 17,13
4 1 new 0 cache 3,7,11,12,15
4 1 gate 27 2Q 12,7
4 1 gate 29 2Q 7,3
4 1 gate 32 2Q 11,7
4 1 gate 33 2Q 7,3
4 1 gate 34 2Q 15,11
4 1 gate 36 2Q 11,7
4 1 new 0 cache 2,7,12,16,20
4 1 gate 30 2Q 16,12
4 1 gate 32 2Q 20,16
4 1 gate 38 2Q 7,2
4 1 new 0 cache 3,4,8,12,16
4 1 gate 29 2Q 8,4
4 1 gate 31 2Q 12,8
4 1 gate 33 2Q 8,4
4 1 gate 34 2Q 16,12
4 1 gate 35 2Q 12,8
4 1 gate 37 2Q 8,3
4 1 new 0 cache 8,13
4 1 gate 39 2Q 13,8
4 2 all2all 0 tensor 32,13,13,14,15,16,17,18,19,20,21,22,31,32,33
,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48
,49,50,51,52,0,1,2,3,4,5,6,7,8,9,10
,11,12
4 2 new 0 cache 38,42,47,48,52
4 2 gate 25 2Q 52,47
4 2 gate 27 2Q 47,42
4 2 gate 29 2Q 42,38
4 2 gate 29 2Q 52,48
4 2 gate 33 2Q 52,48
4 2 new 0 cache 41,43,46,47,51
4 2 gate 21 2Q 51,46
4 2 gate 24 2Q 46,41
4 2 gate 25 2Q 51,46
4 2 gate 30 2Q 51,47
4 2 gate 31 2Q 47,43
4 2 gate 34 2Q 51,47
4 2 new 0 cache 39,43,47,52
4 2 gate 33 2Q 43,39
4 2 gate 35 2Q 47,43
4 2 gate 37 2Q 52,47
4 2 new 0 cache 33,36,37,41,46
4 2 gate 26 2Q 41,36
4 2 gate 28 2Q 46,41
4 2 gate 30 2Q 41,37
4 2 gate 31 2Q 37,33
4 2 new 0 cache 40,42,45,46,50
4 2 gate 22 2Q 50,45
4 2 gate 24 2Q 45,40
4 2 gate 26 2Q 50,45
4 2 gate 30 2Q 50,46
4 2 gate 32 2Q 46,42
4 2 gate 34 2Q 50,46
4 2 new 0 cache 34,38,42,46,51
4 2 gate 31 2Q 38,34
4 2 gate 33 2Q 42,38
4 2 gate 36 2Q 46,42
4 2 gate 37 2Q 51,46
4 2 new 0 cache 31,35,36,40,45
4 2 gate 25 2Q 40,35
4 2 gate 27 2Q 36,31
4 2 gate 28 2Q 45,40
4 2 gate 30 2Q 40,36
4 2 new 0 cache 37,41,44,45,49
4 2 gate 22 2Q 49,44
4 2 gate 26 2Q 49,44
4 2 gate 29 2Q 49,45
28
Tensor Phase Mode Depth Gate Arguments
4 2 gate 32 2Q 45,41
4 2 gate 33 2Q 49,45
4 2 gate 34 2Q 41,37
4 2 gate 36 2Q 45,41
4 2 new 0 cache 32,36,40,44,49
4 2 gate 31 2Q 44,40
4 2 gate 32 2Q 36,32
4 2 gate 34 2Q 40,36
4 2 gate 35 2Q 44,40
4 2 gate 38 2Q 49,44
4 2 new 0 cache 45,50
4 2 gate 38 2Q 50,45
4 2 write 0 disk 23,24,25,26,27,28,29,30
5 1 read 0 disk 0,1,2,3,49,50,51,52
5 1 new 0 tensor 32,13,4,5,6,7,8,9,10,11,12,13,14,15,16
,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46
,47,48
5 1 new 0 cache 5,10,15,19,23
5 1 gate 33 2Q 23,19
5 1 gate 35 2Q 19,15
5 1 gate 38 2Q 15,10
5 1 gate 40 2Q 10,5
5 1 new 0 cache 7,12,17,21,25
5 1 gate 34 2Q 25,21
5 1 gate 36 2Q 21,17
5 1 gate 37 2Q 17,12
5 1 gate 39 2Q 12,7
5 1 new 0 cache 17,22,26,29,33
5 1 gate 30 2Q 26,22
5 1 gate 32 2Q 29,26
5 1 gate 33 2Q 33,29
5 1 gate 34 2Q 26,22
5 1 gate 36 2Q 29,26
5 1 gate 40 2Q 22,17
5 1 new 0 cache 21,26,30,34
5 1 gate 34 2Q 34,30
5 1 gate 38 2Q 26,21
5 1 gate 40 2Q 30,26
5 1 new 0 cache 18,23,25,28,32
5 1 gate 33 2Q 32,28
5 1 gate 35 2Q 28,25
5 1 gate 38 2Q 23,18
5 1 new 0 cache 20,24,27,31,35
5 1 gate 30 2Q 31,27
5 1 gate 31 2Q 27,24
5 1 gate 32 2Q 35,31
5 1 gate 33 2Q 24,20
5 1 gate 34 2Q 31,27
5 1 gate 35 2Q 27,24
5 1 gate 36 2Q 35,31
5 1 new 0 cache 14,16,19,20,24
5 1 gate 36 2Q 20,16
5 1 gate 37 2Q 24,19
5 1 gate 40 2Q 19,14
5 1 new 0 cache 6,11,16,21
5 1 gate 37 2Q 16,11
5 1 gate 39 2Q 21,16
5 1 gate 40 2Q 11,6
5 1 new 0 cache 15,20,25
5 1 gate 37 2Q 25,20
5 1 gate 39 2Q 20,15
5 2 all2all 0 tensor 32,13,17,18,19,20,21,22,23,24,25,26,27,28,29
,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44
,45,46,47,48,4,5,6,7,8,9,10,11,12,13,14
,15,16
5 2 new 0 cache 32,35,36,40,45
5 2 gate 36 2Q 36,32
29
Tensor Phase Mode Depth Gate Arguments
5 2 gate 37 2Q 40,35
5 2 gate 40 2Q 45,40
5 2 new 0 cache 31,36,41,46
5 2 gate 38 2Q 41,36
5 2 gate 39 2Q 36,31
5 2 gate 40 2Q 46,41
5 2 new 0 cache 27,32,33,37,42
5 2 gate 35 2Q 37,33
5 2 gate 37 2Q 32,27
5 2 gate 38 2Q 42,37
5 2 gate 40 2Q 37,32
5 2 new 0 cache 28,33,34,38,43
5 2 gate 35 2Q 38,34
5 2 gate 37 2Q 43,38
5 2 gate 38 2Q 33,28
5 2 gate 40 2Q 38,33
5 2 new 0 cache 25,29,34,42,47
5 2 gate 38 2Q 34,29
5 2 gate 39 2Q 47,42
5 2 gate 40 2Q 29,25
5 2 new 0 cache 23,27,43,48
5 2 gate 39 2Q 48,43
5 2 gate 39 2Q 27,23
5 2 new 0 cache 24,28,34,39
5 2 gate 39 2Q 39,34
5 2 gate 39 2Q 28,24
5 2 write 0 disk 0,1,2,3,49,50,51,52
The following listing is for the 54-qubit circuit.
Tensor Phase Mode Depth Gate Arguments
0 0 define 0 qubits 54
0 0 define 0 disk 0,1,2,3,4,5,6,7,8,23,24,25,26,27,28
,29,30,31,50,51,52,53
1 1 new 0 tensor 27,0,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26
1 1 new 0 cache 10,14,18,19,23
1 1 gate 1 2Q 14,10
1 1 gate 1 2Q 23,19
1 1 gate 3 2Q 18,14
1 1 new 0 cache 2,6,10,18,23
1 1 gate 2 2Q 6,2
1 1 gate 4 2Q 10,6
1 1 new 0 cache 1,5,6,18,23
1 1 gate 2 2Q 5,1
1 1 gate 6 2Q 6,1
1 1 new 0 cache 5,9,14,18,23
1 1 gate 3 2Q 9,5
1 1 gate 6 2Q 14,9
1 1 new 0 cache 0,5,18,23
1 1 gate 5 2Q 5,0
1 1 new 0 cache 18,22,23,26
1 1 gate 2 2Q 26,22
1 1 new 0 cache 18,20,23,24,26
1 1 gate 1 2Q 24,20
1 1 new 0 cache 12,16,18,20,23
1 1 gate 2 2Q 16,12
1 1 gate 4 2Q 20,16
1 1 new 0 cache 18,20,21,23,25
1 1 gate 2 2Q 25,21
1 2 entgl 0 tensor 27,8,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
1 2 entgl 3 EI 25,-1,-2
1 2 entgl 3 EI 24,-3,-4
1 2 entgl 4 EI 26,-5,-6
30
Tensor Phase Mode Depth Gate Arguments
1 2 entgl 4 EI 23,-7,-8
1 2 new 0 cache 18,20,23,25
1 2 gate 5 2Q 25,20
1 2 gate 6 2Q 23,18
1 2 new 0 cache 11,15,18,23,25
1 2 gate 2 2Q 15,11
1 2 new 0 cache 9,14,15,19,24
1 2 gate 3 2Q 19,15
1 2 gate 5 2Q 24,19
1 2 gate 8 2Q 19,14
1 2 gate 10 2Q 14,9
1 3 entgl 0 tensor 28,13,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
1 3 entgl 7 EI 24,-9,-10
1 3 entgl 7 EI 23,-11,-12
1 3 entgl 8 EI 25,-15,-16
1 3 new 0 cache 14,18,19,23,24
1 3 gate 9 2Q 24,19
1 3 gate 10 2Q 23,18
1 3 gate 12 2Q 19,14
1 3 new 0 cache 0,5,10,15,20
1 3 gate 6 2Q 15,10
1 3 gate 7 2Q 20,15
1 3 gate 8 2Q 10,5
1 3 gate 9 2Q 5,0
1 3 gate 10 2Q 15,10
1 3 gate 12 2Q 10,5
1 3 new 0 cache 10,14,18,20,25
1 3 gate 9 2Q 25,20
1 3 gate 13 2Q 14,10
1 3 gate 15 2Q 18,14
1 3 new 0 cache 13,15,17,20,21
1 3 gate 1 2Q 17,13
1 3 gate 4 2Q 21,17
1 3 gate 11 2Q 20,15
1 3 new 0 cache 3,7,11,21,26
1 3 gate 1 2Q 7,3
1 3 gate 4 2Q 11,7
1 3 gate 6 2Q 26,21
1 3 new 0 cache 1,5,6,11,16
1 3 gate 5 2Q 16,11
1 3 gate 8 2Q 11,6
1 3 gate 10 2Q 6,1
1 3 gate 14 2Q 5,1
1 3 new 0 cache 0,1,5,9,26
1 3 gate 15 2Q 9,5
1 3 gate 18 2Q 5,1
1 3 gate 19 2Q 9,5
1 3 gate 21 2Q 5,0
1 3 new 0 cache 6,11,15,16,21
1 3 gate 7 2Q 21,16
1 3 gate 9 2Q 16,11
1 3 gate 12 2Q 11,6
1 3 gate 14 2Q 15,11
1 3 new 0 cache 3,4,8,12,13
1 3 gate 1 2Q 8,4
1 3 gate 3 2Q 12,8
1 3 gate 5 2Q 8,3
1 3 gate 7 2Q 13,8
1 3 gate 9 2Q 8,3
1 3 gate 11 2Q 13,8
1 3 gate 13 2Q 8,4
1 3 new 0 cache 2,6,7,12,17
1 3 gate 5 2Q 17,12
1 3 gate 6 2Q 7,2
1 3 gate 7 2Q 12,7
1 3 gate 10 2Q 7,2
31
Tensor Phase Mode Depth Gate Arguments
1 3 gate 14 2Q 6,2
1 3 new 0 cache 6,9,10,14,18
1 3 gate 16 2Q 10,6
1 3 gate 17 2Q 14,10
1 3 gate 19 2Q 18,14
1 3 gate 22 2Q 14,9
1 3 new 0 cache 1,2,6,10,26
1 3 gate 18 2Q 6,2
1 3 gate 20 2Q 10,6
1 3 gate 22 2Q 6,1
1 3 new 0 cache 7,12,13,17,22
1 3 gate 8 2Q 22,17
1 3 gate 9 2Q 17,12
1 3 gate 11 2Q 12,7
1 3 gate 12 2Q 22,17
1 3 gate 13 2Q 17,13
1 3 new 0 cache 3,7,11,21,26
1 3 gate 13 2Q 7,3
1 3 gate 16 2Q 11,7
1 3 gate 17 2Q 7,3
1 4 entgl 0 tensor 30,13,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,23,24,25,26,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
1 4 entgl 8 EI 26,-13,-14
1 4 new 0 cache 8,12,16,21,26
1 4 gate 10 2Q 26,21
1 4 gate 11 2Q 21,16
1 4 gate 14 2Q 16,12
1 4 gate 15 2Q 12,8
1 4 new 0 cache 4,8
1 4 gate 17 2Q 8,4
2 5 new 0 tensor 27,2,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,53,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
2 5 new 0 cache 37,41,45,46,50
2 5 gate 1 2Q 50,46
2 5 gate 2 2Q 41,37
2 5 gate 3 2Q 45,41
2 5 gate 6 2Q 50,45
2 5 gate 10 2Q 50,45
2 5 new 0 cache 31,35,38,42,46
2 5 gate 2 2Q 35,31
2 5 gate 2 2Q 42,38
2 5 gate 4 2Q 46,42
2 5 new 0 cache 30,34,35,39,43
2 5 gate 1 2Q 34,30
2 5 gate 1 2Q 43,39
2 5 gate 3 2Q 39,35
2 5 new 0 cache 30,31,35,40,44
2 5 gate 1 2Q 44,40
2 6 entgl 0 tensor 27,6,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,53,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
2 6 entgl 4 E2Q 30,-5,-6
2 6 entgl 4 E2Q 27,-7,-8
2 6 entgl 8 E2Q 31,-13,-14
2 6 new 0 cache 30,34,35,38,40
2 6 gate 3 2Q 38,34
2 6 gate 6 2Q 35,30
2 6 gate 7 2Q 40,35
2 6 new 0 cache 28,30,32,35,40
2 6 gate 2 2Q 32,28
2 6 new 0 cache 30,32,35,36,40
2 6 gate 4 2Q 36,32
2 6 new 0 cache 36,41,46,47,51
2 6 gate 2 2Q 51,47
2 6 gate 5 2Q 41,36
2 6 gate 6 2Q 51,46
32
Tensor Phase Mode Depth Gate Arguments
2 6 gate 8 2Q 46,41
2 6 gate 9 2Q 41,36
2 6 gate 10 2Q 51,46
2 6 gate 12 2Q 46,41
2 6 new 0 cache 30,35,40,46,50
2 6 gate 13 2Q 50,46
2 6 new 0 cache 27,30,32,35,40
2 6 gate 5 2Q 32,27
2 6 new 0 cache 30,35,40,43,47
2 6 gate 4 2Q 47,43
2 6 new 0 cache 30,35,38,40,43
2 6 gate 6 2Q 43,38
2 7 entgl 0 tensor 27,10,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,53,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
2 7 entgl 3 E2Q 28,-3,-4
2 7 entgl 8 E2Q 30,-15,-16
2 7 new 0 cache 29,30,33,35,40
2 7 gate 1 2Q 33,29
2 7 gate 10 2Q 35,30
2 7 gate 11 2Q 40,35
2 7 new 0 cache 27,32,33,37,42
2 7 gate 4 2Q 37,33
2 7 gate 6 2Q 42,37
2 7 gate 7 2Q 37,32
2 7 gate 9 2Q 32,27
2 7 new 0 cache 37,42,47,48,52
2 7 gate 2 2Q 52,48
2 7 gate 5 2Q 52,47
2 7 gate 8 2Q 47,42
2 7 gate 9 2Q 52,47
2 7 gate 10 2Q 42,37
2 7 gate 12 2Q 47,42
2 7 new 0 cache 29,32,37,41,45
2 7 gate 11 2Q 37,32
2 7 gate 14 2Q 41,37
2 7 gate 15 2Q 45,41
2 7 new 0 cache 28,33,38,47,51
2 7 gate 5 2Q 33,28
2 7 gate 8 2Q 38,33
2 7 gate 14 2Q 51,47
2 7 new 0 cache 29,34,39,44,48
2 7 gate 3 2Q 48,44
2 7 gate 5 2Q 44,39
2 7 new 0 cache 38,43,48,49,53
2 7 gate 1 2Q 53,49
2 7 gate 5 2Q 53,48
2 7 gate 7 2Q 48,43
2 7 gate 9 2Q 53,48
2 7 gate 10 2Q 43,38
2 7 gate 11 2Q 48,43
2 8 entgl 0 tensor 28,13,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,53,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
2 8 entgl 3 E2Q 29,-1,-2
2 8 entgl 7 E2Q 28,-11,-12
2 8 new 0 cache 28,33,38,42,46
2 8 gate 9 2Q 33,28
2 8 gate 12 2Q 38,33
2 8 gate 14 2Q 42,38
2 8 gate 16 2Q 46,42
2 8 new 0 cache 46,48,50,52
2 8 gate 14 2Q 52,48
2 8 gate 17 2Q 50,46
2 8 new 0 cache 29,34,39,44,49
2 8 gate 6 2Q 34,29
2 8 gate 7 2Q 49,44
2 8 gate 8 2Q 39,34
33
Tensor Phase Mode Depth Gate Arguments
2 8 gate 9 2Q 44,39
2 8 gate 11 2Q 49,44
2 8 new 0 cache 29,40,44,48,52
2 8 gate 13 2Q 44,40
2 8 gate 15 2Q 48,44
2 8 gate 17 2Q 44,40
2 8 gate 18 2Q 52,48
2 8 gate 19 2Q 48,44
2 8 new 0 cache 29,34,48,49,53
2 8 gate 13 2Q 53,49
2 8 gate 17 2Q 53,49
2 8 gate 21 2Q 53,48
2 9 entgl 0 tensor 30,13,27,28,29,30,31,32,33,34,35,36,37,38,39
,40,41,42,43,44,45,46,47,48,49,50,51,52,53,-1
,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16
2 9 entgl 7 E2Q 29,-9,-10
2 9 new 0 cache 29,34,39,43,47
2 9 gate 10 2Q 34,29
2 9 gate 12 2Q 39,34
2 9 gate 13 2Q 43,39
2 9 gate 16 2Q 47,43
2 9 new 0 cache 47,51
2 9 gate 18 2Q 51,47
0 10 slice 0 disk 0,1,2,3,4,50,51,52,53
1 10 all2all 0 tensor 32,6,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13
,-14,-15,-16,11,12,13,14,15,16,17,18,19,20,21,22
,23,24,25,26,5,6,7,8,9,10
2 10 all2all 0 tensor 32,7,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13
,-14,-15,-16,27,28,29,30,31,32,33,34,35,36,37,38
,39,40,41,42,43,44,45,46,47,48,49
3 1 new 0 tensor 32,13,11,12,13,14,15,16,17,18,19,20,21,22,23
,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38
,39,40,41,42,5,6,7,8,9,10,43,44,45,46,47
,48,49
3 1 new 0 cache 15,19,23,27,28
3 1 gate 11 2Q 28,23
3 1 gate 13 2Q 23,19
3 1 gate 15 2Q 19,15
3 1 gate 16 2Q 27,23
3 1 gate 17 2Q 23,19
3 1 gate 20 2Q 27,23
3 1 new 0 cache 11,15,18,19,23
3 1 gate 18 2Q 15,11
3 1 gate 19 2Q 19,15
3 1 gate 22 2Q 23,18
3 1 new 0 cache 24,29,33,37,41
3 1 gate 11 2Q 29,24
3 1 gate 13 2Q 33,29
3 1 gate 16 2Q 37,33
3 1 gate 18 2Q 41,37
3 1 new 0 cache 20,24,28,32,36
3 1 gate 13 2Q 24,20
3 1 gate 14 2Q 32,28
3 1 gate 15 2Q 28,24
3 1 gate 16 2Q 36,32
3 1 gate 18 2Q 32,28
3 1 gate 20 2Q 36,32
3 1 new 0 cache 12,16,20,27,32
3 1 gate 16 2Q 20,16
3 1 gate 18 2Q 16,12
3 1 gate 21 2Q 32,27
3 1 new 0 cache 14,19,20,24,28
3 1 gate 17 2Q 24,20
3 1 gate 19 2Q 28,24
3 1 gate 21 2Q 24,19
3 1 gate 24 2Q 19,14
3 1 new 0 cache 16,20,26,31,35
3 1 gate 12 2Q 31,26
34
Tensor Phase Mode Depth Gate Arguments
3 1 gate 14 2Q 35,31
3 1 gate 20 2Q 20,16
3 1 new 0 cache 25,30,34,38,42
3 1 gate 12 2Q 30,25
3 1 gate 13 2Q 34,30
3 1 gate 15 2Q 38,34
3 1 gate 18 2Q 42,38
3 1 new 0 cache 21,25,29,33,37
3 1 gate 14 2Q 25,21
3 1 gate 15 2Q 29,25
3 1 gate 17 2Q 33,29
3 1 gate 20 2Q 37,33
3 1 new 0 cache 18,23,28,33
3 1 gate 21 2Q 33,28
3 1 gate 23 2Q 28,23
3 1 gate 26 2Q 23,18
3 1 new 0 cache 17,20,21,25,29
3 1 gate 16 2Q 21,17
3 1 gate 18 2Q 25,21
3 1 gate 19 2Q 29,25
3 1 gate 21 2Q 25,20
3 1 new 0 cache 22,26,30,34,38
3 1 gate 14 2Q 26,22
3 1 gate 16 2Q 30,26
3 1 gate 17 2Q 34,30
3 1 gate 18 2Q 26,22
3 1 gate 19 2Q 38,34
3 1 gate 20 2Q 30,26
3 1 new 0 cache 19,24,29,34
3 1 gate 22 2Q 34,29
3 1 gate 23 2Q 29,24
3 1 gate 25 2Q 24,19
3 1 new 0 cache 13,17,21,26
3 1 gate 17 2Q 17,13
3 1 gate 20 2Q 21,17
3 1 gate 22 2Q 26,21
3 1 new 0 cache 26,31,35,39
3 1 gate 15 2Q 39,35
3 1 gate 18 2Q 35,31
3 1 gate 24 2Q 31,26
3 2 all2all 0 tensor 32,13,18,19,20,21,22,23,24,25,26,27,28,29,30
,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45
,46,47,48,49,5,6,7,8,9,10,11,12,13,14,15
,16,17
3 2 new 0 cache 27,32,37,42,46
3 2 gate 20 2Q 46,42
3 2 gate 22 2Q 42,37
3 2 gate 23 2Q 37,32
3 2 gate 25 2Q 32,27
3 2 new 0 cache 33,38,39,43,47
3 2 gate 17 2Q 43,39
3 2 gate 20 2Q 47,43
3 2 gate 22 2Q 43,38
3 2 gate 24 2Q 38,33
3 2 new 0 cache 28,33,38,43,48
3 2 gate 23 2Q 48,43
3 2 gate 25 2Q 33,28
3 2 gate 26 2Q 43,38
3 2 gate 28 2Q 38,33
3 2 new 0 cache 23,28,30,35,39
3 2 gate 19 2Q 39,35
3 2 gate 22 2Q 35,30
3 2 gate 27 2Q 28,23
3 2 new 0 cache 29,34,39,44,49
3 2 gate 21 2Q 44,39
3 2 gate 23 2Q 49,44
3 2 gate 24 2Q 39,34
3 2 gate 25 2Q 44,39
35
Tensor Phase Mode Depth Gate Arguments
3 2 gate 26 2Q 34,29
3 2 gate 27 2Q 49,44
3 2 gate 28 2Q 39,34
3 2 new 0 cache 24,25,29,30,33
3 2 gate 24 2Q 30,25
3 2 gate 27 2Q 29,24
3 2 gate 29 2Q 33,29
3 2 new 0 cache 30,35,40,44
3 2 gate 23 2Q 40,35
3 2 gate 26 2Q 35,30
3 2 gate 27 2Q 40,35
3 2 gate 29 2Q 44,40
3 2 new 0 cache 36,41,45
3 2 gate 19 2Q 45,41
3 2 gate 21 2Q 41,36
3 3 all2all 0 tensor 32,13,5,6,7,8,9,10,11,12,13,14,15,16,17
,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47
,48,49
3 3 new 0 cache 9,14,19,23,27
3 3 gate 26 2Q 14,9
3 3 gate 28 2Q 19,14
3 3 gate 29 2Q 23,19
3 3 gate 32 2Q 27,23
3 3 new 0 cache 10,15,20,25,30
3 3 gate 22 2Q 15,10
3 3 gate 23 2Q 20,15
3 3 gate 25 2Q 25,20
3 3 gate 28 2Q 30,25
3 3 new 0 cache 5,10,15,30,34
3 3 gate 24 2Q 10,5
3 3 gate 26 2Q 15,10
3 3 gate 29 2Q 34,30
3 3 new 0 cache 7,11,15,20,24
3 3 gate 20 2Q 11,7
3 3 gate 27 2Q 20,15
3 3 gate 29 2Q 24,20
3 3 new 0 cache 11,16,21,26,31
3 3 gate 21 2Q 16,11
3 3 gate 23 2Q 21,16
3 3 gate 26 2Q 26,21
3 3 gate 28 2Q 31,26
3 3 new 0 cache 6,11,16,31,35
3 3 gate 24 2Q 11,6
3 3 gate 25 2Q 16,11
3 3 gate 30 2Q 35,31
3 3 new 0 cache 16,21,25,29
3 3 gate 27 2Q 21,16
3 3 gate 30 2Q 25,21
3 3 gate 31 2Q 29,25
3 3 new 0 cache 8,12,17,22
3 3 gate 19 2Q 12,8
3 3 gate 21 2Q 17,12
3 3 gate 24 2Q 22,17
3 3 write 0 disk 0,1,2,3,4,50,51,52,53
4 1 read 0 disk 23,24,25,26,27,28,29,30,31
4 1 new 0 tensor 32,13,0,1,2,3,4,5,6,7,8,9,10,11,12
,13,14,15,16,17,18,19,20,21,22,32,33,34,35,36
,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
,52,53
4 1 new 0 cache 0,5,10,14,18
4 1 gate 25 2Q 5,0
4 1 gate 28 2Q 10,5
4 1 gate 29 2Q 14,10
4 1 gate 31 2Q 18,14
4 1 new 0 cache 0,1,5,6,9
4 1 gate 26 2Q 6,1
4 1 gate 30 2Q 5,1
36
Tensor Phase Mode Depth Gate Arguments
4 1 gate 31 2Q 9,5
4 1 gate 34 2Q 5,1
4 1 gate 35 2Q 9,5
4 1 gate 37 2Q 5,0
4 1 new 0 cache 6,11,15,19
4 1 gate 28 2Q 11,6
4 1 gate 30 2Q 15,11
4 1 gate 31 2Q 19,15
4 1 new 0 cache 2,6,7,10,12
4 1 gate 22 2Q 7,2
4 1 gate 23 2Q 12,7
4 1 gate 26 2Q 7,2
4 1 gate 30 2Q 6,2
4 1 gate 32 2Q 10,6
4 1 gate 34 2Q 6,2
4 1 new 0 cache 6,9,10,14,18
4 1 gate 33 2Q 14,10
4 1 gate 35 2Q 18,14
4 1 gate 36 2Q 10,6
4 1 gate 38 2Q 14,9
4 1 new 0 cache 1,6,12,17,22
4 1 gate 25 2Q 17,12
4 1 gate 28 2Q 22,17
4 1 gate 38 2Q 6,1
4 1 new 0 cache 3,8,13,17,21
4 1 gate 21 2Q 8,3
4 1 gate 23 2Q 13,8
4 1 gate 25 2Q 8,3
4 1 gate 27 2Q 13,8
4 1 gate 29 2Q 17,13
4 1 gate 32 2Q 21,17
4 1 gate 33 2Q 17,13
4 1 new 0 cache 3,7,11,12,15
4 1 gate 27 2Q 12,7
4 1 gate 29 2Q 7,3
4 1 gate 32 2Q 11,7
4 1 gate 33 2Q 7,3
4 1 gate 34 2Q 15,11
4 1 gate 36 2Q 11,7
4 1 new 0 cache 2,7,12,16,20
4 1 gate 30 2Q 16,12
4 1 gate 32 2Q 20,16
4 1 gate 38 2Q 7,2
4 1 new 0 cache 3,4,8,12,16
4 1 gate 29 2Q 8,4
4 1 gate 31 2Q 12,8
4 1 gate 33 2Q 8,4
4 1 gate 34 2Q 16,12
4 1 gate 35 2Q 12,8
4 1 gate 37 2Q 8,3
4 1 new 0 cache 8,13
4 1 gate 39 2Q 13,8
4 2 all2all 0 tensor 32,13,13,14,15,16,17,18,19,20,21,22,32,33,34
,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
,50,51,52,53,0,1,2,3,4,5,6,7,8,9,10
,11,12
4 2 new 0 cache 39,43,48,49,53
4 2 gate 25 2Q 53,48
4 2 gate 27 2Q 48,43
4 2 gate 29 2Q 43,39
4 2 gate 29 2Q 53,49
4 2 gate 33 2Q 53,49
4 2 new 0 cache 42,44,47,48,52
4 2 gate 21 2Q 52,47
4 2 gate 24 2Q 47,42
4 2 gate 25 2Q 52,47
4 2 gate 30 2Q 52,48
4 2 gate 31 2Q 48,44
37
Tensor Phase Mode Depth Gate Arguments
4 2 gate 34 2Q 52,48
4 2 new 0 cache 40,44,48,53
4 2 gate 33 2Q 44,40
4 2 gate 35 2Q 48,44
4 2 gate 37 2Q 53,48
4 2 new 0 cache 34,37,38,42,47
4 2 gate 26 2Q 42,37
4 2 gate 28 2Q 47,42
4 2 gate 30 2Q 42,38
4 2 gate 31 2Q 38,34
4 2 new 0 cache 41,43,46,47,51
4 2 gate 22 2Q 51,46
4 2 gate 24 2Q 46,41
4 2 gate 26 2Q 51,46
4 2 gate 30 2Q 51,47
4 2 gate 32 2Q 47,43
4 2 gate 34 2Q 51,47
4 2 new 0 cache 35,39,43,47,52
4 2 gate 31 2Q 39,35
4 2 gate 33 2Q 43,39
4 2 gate 36 2Q 47,43
4 2 gate 37 2Q 52,47
4 2 new 0 cache 32,36,37,41,46
4 2 gate 25 2Q 41,36
4 2 gate 27 2Q 37,32
4 2 gate 28 2Q 46,41
4 2 gate 30 2Q 41,37
4 2 new 0 cache 38,42,45,46,50
4 2 gate 22 2Q 50,45
4 2 gate 26 2Q 50,45
4 2 gate 29 2Q 50,46
4 2 gate 32 2Q 46,42
4 2 gate 33 2Q 50,46
4 2 gate 34 2Q 42,38
4 2 gate 36 2Q 46,42
4 2 new 0 cache 33,37,41,45,50
4 2 gate 31 2Q 45,41
4 2 gate 32 2Q 37,33
4 2 gate 34 2Q 41,37
4 2 gate 35 2Q 45,41
4 2 gate 38 2Q 50,45
4 2 new 0 cache 46,51
4 2 gate 38 2Q 51,46
4 2 write 0 disk 23,24,25,26,27,28,29,30,31
5 1 read 0 disk 0,1,2,3,4,50,51,52,53
5 1 new 0 tensor 32,13,5,6,7,8,9,10,11,12,13,14,15,16,17
,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47
,48,49
5 1 new 0 cache 5,10,15,19,23
5 1 gate 33 2Q 23,19
5 1 gate 35 2Q 19,15
5 1 gate 38 2Q 15,10
5 1 gate 40 2Q 10,5
5 1 new 0 cache 7,12,17,21,25
5 1 gate 34 2Q 25,21
5 1 gate 36 2Q 21,17
5 1 gate 37 2Q 17,12
5 1 gate 39 2Q 12,7
5 1 new 0 cache 17,22,26,30,34
5 1 gate 30 2Q 26,22
5 1 gate 32 2Q 30,26
5 1 gate 33 2Q 34,30
5 1 gate 34 2Q 26,22
5 1 gate 36 2Q 30,26
5 1 gate 40 2Q 22,17
5 1 new 0 cache 21,26,31,35
5 1 gate 34 2Q 35,31
38
Tensor Phase Mode Depth Gate Arguments
5 1 gate 38 2Q 26,21
5 1 gate 40 2Q 31,26
5 1 new 0 cache 18,23,27,29,33
5 1 gate 33 2Q 33,29
5 1 gate 36 2Q 27,23
5 1 gate 38 2Q 23,18
5 1 new 0 cache 24,27,28,32,36
5 1 gate 30 2Q 32,28
5 1 gate 31 2Q 28,24
5 1 gate 32 2Q 36,32
5 1 gate 34 2Q 32,28
5 1 gate 36 2Q 36,32
5 1 gate 37 2Q 32,27
5 1 new 0 cache 14,19,20,24,28
5 1 gate 33 2Q 24,20
5 1 gate 35 2Q 28,24
5 1 gate 37 2Q 24,19
5 1 gate 40 2Q 19,14
5 1 new 0 cache 6,11,16,20,21
5 1 gate 36 2Q 20,16
5 1 gate 37 2Q 16,11
5 1 gate 39 2Q 21,16
5 1 gate 40 2Q 11,6
5 1 new 0 cache 15,20,25,29
5 1 gate 35 2Q 29,25
5 1 gate 37 2Q 25,20
5 1 gate 39 2Q 20,15
5 2 all2all 0 tensor 32,13,18,19,20,21,22,23,24,25,26,27,28,29,30
,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45
,46,47,48,49,5,6,7,8,9,10,11,12,13,14,15
,16,17
5 2 new 0 cache 33,36,37,41,46
5 2 gate 36 2Q 37,33
5 2 gate 37 2Q 41,36
5 2 gate 40 2Q 46,41
5 2 new 0 cache 32,37,42,47
5 2 gate 38 2Q 42,37
5 2 gate 39 2Q 37,32
5 2 gate 40 2Q 47,42
5 2 new 0 cache 28,33,34,38,43
5 2 gate 35 2Q 38,34
5 2 gate 37 2Q 33,28
5 2 gate 38 2Q 43,38
5 2 gate 40 2Q 38,33
5 2 new 0 cache 29,34,35,39,44
5 2 gate 35 2Q 39,35
5 2 gate 37 2Q 44,39
5 2 gate 38 2Q 34,29
5 2 gate 40 2Q 39,34
5 2 new 0 cache 25,30,35,43,48
5 2 gate 38 2Q 35,30
5 2 gate 39 2Q 48,43
5 2 gate 40 2Q 30,25
5 2 new 0 cache 23,28,44,49
5 2 gate 39 2Q 49,44
5 2 gate 39 2Q 28,23
5 2 new 0 cache 24,29,35,40
5 2 gate 39 2Q 40,35
5 2 gate 39 2Q 29,24
5 2 write 0 disk 0,1,2,3,4,50,51,52,53
39
