The reliable resource estimation and benchmarking of quantum algorithms is a critical component of the development cycle of viable quantum applications for quantum computers of all sizes. Determining resource bottlenecks in algorithms, especially when resource intensive error correction protocols are required, is crucial to reduce the cost of implementing viable algorithms on actual quantum hardware.
T he potential of quantum computing will be proven with quantum computers up and running, but it is uncertain when this will happen. Nevertheless, researchers and companies are racing to build the first large-scale, error-corrected quantum computers. The private sector seems increasingly optimistic about the commercial viability of the project and considers quantum computing to be a breakthrough.
Researchers are still somewhat pessimistic about when the first large-scale computer that conclusively solves useful problems better than classical machines will exist. We argue for a pragmatic approach. Although it could be sooner rather than later, it is going to take some time before quantum machines begin to be cost-effective solutions to a scientifically or commercially useful problem. A pragmatic approach to the assessment of quantum applications begins with accurate resource estimation and deriving useful performance analytics as to the actual physical resources consumed by a given quantum computation. This is enabled by software, which is used to estimate the resources consumed by a quantum program. These estimations are useful to assess the current state of quantum hardware and software development and refine algorithms to speed up their eventual adoption.
THE SHOE BOX
Bin packing 1 is a quintessential computer science problem, and very often, optimization problems are equivalent to packing as many objects with different volumes in a bin of a given volume. We state, right from the beginning, that compiling fault-tolerant, error-corrected quantum algorithms is very similar to bin packing, but we call the bins shoe boxes. This is done not only for rhetorical reasons but because one of the authors used shoe boxes for archiving files related to well-defined topics (e.g., handwritten computer engineering lecture notes). Each shoe box included the content of a much broader topic, i.e., everything necessary to understand and explain the topic. Once the notes were written (synthesized), these were reorganized during weekends to eliminate redundancy (optimization), learned and reproduced during an exam (verification), and, finally, stored into the box. If the same topic needed to be relearned after many years, the shoe box contents were an optimal starting point.
In this discussion, the difficulty of implementing bin packing for fault-tolerant, error-corrected quantum applications (and consequently, of performing resource optimization and benchmarking) will be visually explained. There is a significant practical advantage related to this problem: the smaller the shoe box, the sooner that scientifically or commercially viable quantum computing becomes a practical reality.
The software tools used to generate shoe boxes are sometimes referred to as quantum compilers. Later, we argue that quantum resources estimators (software enabling the realistic view on the quantum computing horizon) are more than compilers. These software tools will form the core of future quantum operating systems and become a key component in the developmental cycle of quantum applications.
One qubit, two qubits, three qubits, …
A significant amount of research has been invested into finding out how many qubits a quantum computer will need to be more powerful than the most powerful classical computer. This problem is commonly referred to as quantum supremacy, where abstract problems, which are classically intractable for classical computers of any size, are formulated specifically for quantum computers. 2 Other formulations of the problem are sometimes known as quantum advantage, 3 where metrics such as dollar costs for running programs on classical and quantum hardware are taken into account. Researchers work to define accurate metrics to determine when a quantum computing system reaches these regimes (supremacy, advantage, and so on). This research is also related to the more formal problem of bin packing. 1 In this context, what is the largest instance of the most practical quantum algorithm that can be executed on the largest available quantum computer? Practical is often related to the ability to use a quantum algorithm to obtain a computational result that is either scientifically or commercially valuable beyond the fact that a quantum computer was used to derive it. In other words, what is the largest quantum computation that can be squeezed to the current (and in the near future) small-sized quantum chips [also known as noisy intermediate scale quantum (NISQ) machines 4 ]?
The term small-size quantum chip does not refer to the dimensions of the chip in centimeters but to some metric that considers both the number of qubits and the topology of the interqubit connections. It is not straightforward to define a particularly expressive metric. Some use individual two-qubit gate fidelities as a marker for the overall quality of a qubit array, others use concepts such as quantum volume. 5 There are also researchers that place a category boundary between applications requiring active quantum error-correcting codes and applications that do not. To date, there is no consensus within the community, and therefore, it is not exactly clear what small refers to. But it is almost certain that the current generation of qubit chips, with fewer than 100 qubits, is small. With these devices, entering the quantum supremacy or quantum advantage regime may be possible, but it is still very unclear if practical quantum computing, using NISQ machines, is even possible.
Multiplied by computing time
The connections supported between qubits within a given chip determine the number of operations necessary to execute an algorithm. The problem is very similar to how computations are executed on classical machines: If not all of the registers are addressable from a given register, or by a certain instruction, some workarounds have to be used. These workarounds increase the execution time of the computation.
One of the main issues is that the quantum hardware is not perfect, such that the information stored in the quantum registers (the physical hardware qubits) loses its precision (decoheres) over time. Furthermore, each quantum instruction (known as a gate) has a negative effect on precision due to inevitable errors that occur when control pulses are applied to the physical qubits to perform gates. Because hardware (both storage and operation) is faulty, in the absence of error correction, 6 a quantum computation has to be executed as fast as possible and with as few gates as possible. If the NISQ regime is defined as the boundary where resource-heavy quantum error-correction protocols need to be used, then these computations need to be executed fast and with few physical qubits.
A reasonably accurate rule of thumb is to take the number of qubits needed for a given algorithm, Q, and the depth of your algorithm, K (depth is the number of parallel gate steps needed in the algorithm), and calculate the quantity, A = 1/ (KQ). If the physical error rate p of your qubit array (the worst error rate associated with qubit initialization, single and two-qubit gates and measurement over the entire computer) is p < A, then your algorithm is potentially implementable without error correction. If p >> A, then extensive error correction will be needed, and your quantum algorithm is highly unlikely to be NISQ compatible.
Equals a shoe box
The research community is increasingly in agreement about the need for abstracting the computational resources (hardware and time) necessary to execute a particular computation as a spacetime volume, i.e., the number of physical qubits needed for a particular algorithm (space), multiplied by the gate depth of the algorithm (time). 7 In this discussion, we call this volume a shoe box. The volume of a computation depends on many factors, and once more, there is no exhaustive study that enumerates all of them.
Determining the shoe box, which encapsulates a quantum algorithm, in principle, benchmarks how large a quantum computer will be needed to execute it ( Figure 1 ). When this is done rigorously, all practical quantum algorithms are shown to be too large to run on the current generation of physical hardware. There is a real gap between the available quantum chips/computers and the algorithms one would like have executed. As a result, three general approaches are being taken: › increase the chip sizes: more qubits, better connectivity › improve the algorithms: less hardware and less execution time › optimize the shoe boxes: the same algorithm in a smaller volume.
We argue that, to accelerate algorithm and hardware development, flexible and fast software tools are required to estimate resources and benchmark quantum algorithms as accurately as possible (a particular case of hardware/ software co-design). Accuracy refers not just to final qubit counts and computational time metrics but also includes detailed analytics about where resources are used and what are the most costly subcomponents of a quantum algorithm. Thus, we focus our discussion on this refinement of resource estimation and how the shoe box analogy can be used to accurately analyze and benchmark the performance and hardware requirements of a quantum program/algorithm.
Error correction: Replacing paper with clay tablets
The shoe box analogy was introduced in conjunction with lecture notes written on sheets of paper. These lecture notes stand in relation to the environment they are in (i.e., they are coupled to a bath of entropy). This motivates another analogy: a quantum computer is like a bathtub filled with water, and executing a quantum algorithm is to drop the shoe box in the bathtub. Although paper is light, easy to
Geometric Layout
Physical Qubits Time write on, and cheap to produce, the problem is that it can be easily destroyed by the water. A solution would be to use clay tablets (properly baked) instead of paper. This comes with advantages and disadvantages: the contents of the shoe box would be relatively immune to damage from the water, but the volume of the box would be much larger because each clay tablet is now a big, bulky, and robust object that contains no more information than the original, delicate piece of paper.
However, there does not seem to be another solution in practice. Quantum hardware is so fragile that error correction has been accepted as a necessity for applications of practical size. Therefore, there appears little hope for the current NISQ machines (if the definition of NISQ is chosen to be applications that do not need error correction) to be able to execute industrially relevant quantum computations. Such machines have too few qubits to support error correction for the entire quantum program. Thus, as with clay tablets, optimization of shoe boxes includes the following two research directions: › smaller clay tables for the same information content to minimize the number of qubits necessary for error detection and correction › thinner clay tablets with the same strength to minimize the time necessary for performing error correction, without reducing its resilience to errors.
TECHNICAL BACKGROUND
Quantum resource estimation is the process of synthesizing (compiling), optimizing, and verifying an error-corrected quantum computation that requires the least physical resources in terms of qubit hardware and execution time; think of writing on the thinnest and smallest possible clay tablets. Realistic quantum resource estimation requires explicitly compiling a fully error-corrected algorithm (to the level of essentially generating an execution sequence for every qubit and gate) and mapping this to a given hardware model using the most current technical assumptions. The vast majority of algorithmic benchmarking is done on a case-by-case basis, often by hand, and necessitates significant hardware and compilation assumptions. 8 As is often the case, conservative assumptions are chosen. These assumptions are too conservative for some researchers, while they are not conservative enough for others. Many assumptions also relate to design issues at the hardware level, such as the following: › The theoretically best error-correcting code is not the easiest one to implement in hardware.
› Hardware capabilities are not as advanced as error correction would require.
For the moment, every assumption should be as conservative as possible (pessimism). The goal is to start from large, thick clay tablets and invent plastic sheets on the way (newer techniques in error-corrected compilation and optimization) by making as few technical assumptions as possible and building a platform that can adapt to changes in experimental hardware design on the fly. In the following, we present the technical aspects that determine the size and number of the clay tablets in a shoe box, which can then be used to benchmark and analyze a fully error-corrected quantum algorithm.
Surface codes
Quantum error correction is a component of all practical quantum-computing architectures, and a wide variety of codes have been proposed. The surface code is used most often because it can tolerate high error rates (compared to other codes) while arguably having the most experimentally feasible hardware configurations (also compared to other codes). Consequently, surface codes are very often used as architectural building blocks for large-scale machines.
Error correction can be implemented with different techniques using the surface code, 9 but all of the recipes start from the same ingredients.
Construct an array of qubits
arranged in a 2D lattice. 2. Allow each qubit to interact with all its direct neighbors (e.g., for qubits not on the boundary: north, south, east, and west) and perform qubit-specific initialization and measurement in multiple bases.
Two of the most common techniques are called braiding and lattice surgery (Figure 2) . The latter has recently attracted more attention because it appears to require fewer physical qubits to implement a given computation. 10 Braiding starts from the assumption that holes (also called defects) can be punctured into the imaginary surface spanned across the qubit lattice. Quantum algorithms are encoded into the movement of the holes around the surface. Surgery uses a different concept: the spanned surface is split into patches, which can be glued together (merged) or cut apart (split). For this technique, quantum algorithms are encoded in the sequence of logic operations that are enacted on encoded patches when they are merged and split. Similar to classical error correction, quantum codes have a distance, the number of errors that can be detected and corrected. For the surface code, the distance is related to the physical (x -y) dimensions of the holes (when braiding) or of the patches (when performing surgery). The distance is chosen based on an error model that captures the relevant hardware characteristics. Faulty hardware will require strong error correction and, thus, longer code distances, which means more qubits. Increasing the quality of the qubits reduces the amount of necessary error correction. This leads to hardware optimization, which needs to be built into any analytics platform as an automatic update when higher-fidelity operations are reported for any potential hardware system.
The number of qubits used for error correction could be also optimized when keeping the hardware quality constant by arranging the holes or the patches more efficiently on the surface, a case of 2D bin packing. 11 The time necessary to execute a surface code-protected quantum computation can be reduced by determining shorter hole movements or shorter sequences of patch interactions. This is the software optimization in the benchmarking problem.
An accurate and useful quantum resource estimator assumes that hardware will reach the minimum necessary accuracy for error correction to work (approximately a reliable 99.9% fidelity of all physical-level operations across the qubit chipset). Software optimization to consider includes › more compact encodings of the algorithm on an imaginary two-dimensional surface of qubits-reduces lattice dimensions ➝ fewer qubits › more compact, but computationally equivalent, interaction of the holes or patches on the surface-fewer steps necessary to move a hole ➝ less execution time on the computer ➝ less exposure to decoherence (shoe box in the water-filled bathtub) ➝ shorter distance ➝ fewer qubits.
Clifford and T(-Bone)
One of the difficulties of error correction is that arbitrary operations cannot be enacted on encoded data in a simple way (technically known as transversal logic operations). Some gates in a universal gate set have to be reformulated (rewritten/compiled) into a fault-tolerant form compatible with the code. In practice, this means that, with some code exceptions, computations are decomposed into a sequence of operations (instructions/gates) chosen from a set called Clifford + T, because there are known recipes of how to encode fault-tolerant qubit states and logic gates from the Clifford + T set.
While drafting this manuscript, we discovered that the children's book series Clifford the Big Red Dog 12 has strangely encapsulated the qualitative nature of this problem. The cartoon series includes another dog called T-Bone. Clifford is friendly and helpful, and T-Bone is a bulldog with a large appetite. The character traits of the cartoon dogs are surprisingly consistent with the effect of the Clifford and T gates on surface codeprotected computations.
Computations consisting entirely of Clifford gates are friendly because they can be efficiently simulated on classical computers. Thus, universal quantum computations require something more than only Clifford gates. This extra component is realized by the T gate, where T 4 = Z and Z is the Pauli phase flip gate. Clifford + T computations are universal and cannot be efficiently simulated on classical computers. When implementing error-corrected computations, at least one gate is generally difficult to implement directly in the code space, and for surface codes, this is generally the T gate. (One difficulty in this context is that additional physical resources, either time or space, are needed to enact the gate fault tolerantly. The specifics of the additional overhead vary.) Each time a T gate has to be applied in a fault-tolerant, error-corrected manner, an additional complex mechanism is included into the FIGURE 2. Two planar patches of surface code are merged. The large green circles represent data qubits that collectively encode a single logical qubit for each patch. The small red circles are ancillary qubits that continuously measure the parity of the surrounding data qubits and extract information related to physical qubit errors.
computation: a distillation procedure. 13 This procedure, as its name suggests, is used to distill (purify) faulty computational resources into less faulty ones. T gates have a large appetite for computational resources because each distillation uses additional qubits and time, ultimately to enact a single logic gate.
Researchers have targeted this problem because reducing the number of T gates in a quantum algorithm automatically results in a reduction in shoe box volume. At the same time, reducing the volume of distillation protocols to enact individual T gates results in vast improvements because (at least for the moment) all distillation procedures implement the same gate. Surprisingly, there are computations for which the number of T gates does not dominate the error-correction resource overhead. 8 The number and properties of such algorithms has not been investigated thoroughly, but it has been suggested that, in some cases, Clifford optimization is the dominant factor in generating smaller shoe boxes.
REALISTIC ASSUMPTIONS
In the previous section, we noted that assumptions related to hardware, compilation, and implementation are generally pessimistic and conservative. The surface code choice and the fact that computations are most often prepared to the Clifford + T gate set imply that some technical assumptions have already been made.
These assumptions may prove to be true as hardware development continues. Even with fixed choices, we still have significant f lexibility regarding how shoe boxes are generated and look: thin (few qubits) and long (long computational times) versus thick (lots of qubits) and short (fast). Or we may choose more futuristic materials than clay [different quantum error correction (QEC) codes?], which may require more flexible hardware architectures. Other design considerations include issues such as the speed of the fastest classical computer that is economically viable to control the error-corrected quantum computer and how this influences design choices of both the quantum code and the hardware it is run on.
One distillation at a time
Parallelization of quantum algorithms will only be possible with cheap and scalable hardware. This will not be the case in the foreseeable future because qubits will be expensive and in short supply. It seems that the most sensible decision is to allow distillations to be executed only sequentially, thus saving significant amounts of physical hardware. The tradeoff is that computations will take longer to execute. It is not possible to speed up computations by being optimistic about future qubit numbers that are not reflected in the available developmental roadmap for large-scale hardware.
T h i s a ssu mpt ion open s novel research directions because it has always been assumed that computation depth can be reduced by parallelizing T gates. Moreover, depth is routinely reduced by inserting additional qubits into the computation (used as temporary workbenches, similar to how one adds random-access memory to a computer when the current processor is too slow and a new processor is too expensive). For the error-corrected regime, and in the situation where all qubits require the same error correction, such an optimization method is not always feasible.
Future work should address new methods to reduce circuit depth without increasing the number of computation qubits. Such approaches would need to account for distillation sequentially because T gates cannot be executed at any time, but only when their corresponding distillation protocols are finished. 14 For the moment, a distillation circuit is the execution of multiple error-corrected Clifford gates. This implies that, in the worst case, two sequential error-corrected T gates will be separated in time by a block of gates used to perform distillation [i.e., temporal penalties in sequential T gate distillation can be reduced by compacting the space-time volume (Figure 3 ) of individual T gate distillation protocols].
The practical alternative to surface codes is the surface code
There are many alternatives to the surface code, and most of them were proposed because it requires distillation protocols for universality. The hope is that codes exist that have the same, or even better, error-correction properties and do not require the inclusion of distillations to achieve universal, error-corrected computation. Such codes have been found, but the devil is always in the details. Many of these codes are not feasible from a hardware engineering point of view; for example, some require dense, long-range qubit connections that continue to expand as the computer scales. That is fine in principle but unacceptable to any experimental hardware engineer who needs to build large arrays of high-fidelity qubits and gates.
The architectural requirements of the surface code are to the advantage of one of the most advanced hardware platforms for quantum computers: superconducting qubits. Even so, currently connecting the qubits to support the simplest surface code is a very complex task, which was visible when the first IBM quantum chips had missing connections between qubits. 15 Increasing the degree of the connections while maintaining or increasing their fidelity will be more complex. Thus, the realistic (pessimistic) assumption is that the ability to produce arbitrary, long-range connections, with high-fidelity and a maximal amount of parallelizability, will not be available in the near term.
Most current hardware systems have issues related to scalability. While qubit chipsets of the order of 1,000 physical qubits could be possible, expanding to millions or billions of qubits for fully fault-tolerant, error-corrected algorithms will require significant work. Superconductors suffer from issues related to the cooling needed for large arrays of physical qubits and the ability to send very large numbers of control signals in and out of dilution refrigeration systems. Ion traps have issues related to creating large vacuum systems to house a large quantum computing system and the intrinsic slow speed of their quantum gates (which becomes compounded when error correction is implemented). These scalability issues exist already for surface codebased architectures, where qubit manipulation and connectivity are the simplest. Moving to other coding techniques simply makes these issues worse.
Another significant issue is related to the minimal error rates needed for the codes to become functional (known as the fault-tolerant threshold).
The surface code has one of the highest thresholds of any coding technique (around 0.7%). More complex codes that have better asymptotic behavior or allow for a more resource-friendly implementation of universal gate sets also have lower fault-tolerant thresholds. The 3D gauge-fixed color codes, which allow transverse gates for the relevant Clifford + T gates, have thresholds simulated without explicit circuit constructions (phenomenological model) of approximately 0.3%. 16 It is expected that these codes would have a full-circuit threshold of around 0.05%, which is more than an order of magnitude below the surface code.
This leads to an interesting tradeoff. Even though gauge-fixed color codes have transversality available to support the full universal gate set, error rates needed by hardware will have to be at least 0.01% (likely lower). Even if this were possible for the hardware in the short to near term, you would now be sitting nearly two orders of magnitude below the threshold of the surface code. Consequently, the distance of the surface code for a given quantum circuit would be much lower than the color code. It is still unclear whether the additional qubit/time resources needed for state distillation at lower distances to use the surface code would be better or worse than those required when not distilling but, rather, using a transversal gate on a larger code distance gaugefixed color code.
While many QEC codes have been proposed over the years 17 (and are still proposed), the experimental constraints of potential hardware systems put heavy restrictions on what schemes are ultimately practical in the foreseeable future. It would be interesting for future research to devise and analyze the emulation of other codes on surface code architectures. To the best of our knowledge, code emulation has not been looked at, and there are compelling arguments against it. Considering the current state-of-theart quantum computer proposals and the difficulty of constructing quantum chips (Figure 4) , we argue that emulation may 
(b) (a)
be the only feasible option to using different codes in practice. Thus, alternative codes will have to use the surface code architecture under the hood.
PERSPECTIVES
We started discussing shoe boxes and cartoon dogs and continued by presenting technical assumptions to achieve realistic quantum resource estimations. We return our discussion to the shoe box analogy and argue that its contents are more than just a stack of clay tablets. When packing a quantum computation into a shoe box, we implicitly presumed that the computation is static (not dynamic). This means that all of the tablets in the box are going to be executed in a well-defined and known order. However, in practice, there are situations when some tablets will not be executed because the shoe box represents a worst-case estimation. For example, at each branching instruction, the handling of each branch is explicitly inserted into the computation; the concepts of functions, code reuse, and so on are not applied. The shoe box represents the worst-case execution of a computation. This is why distillation procedures are repeated throughout the error-corrected computation.
The software tool we have been describing is best called a resource estimator rather than a compiler. Compilers are more advanced resource estimators because they are first able to identify redundancies in the shoe box and, then, to output a more compact representation of the same box. The shoe box can be considered a loop-unrolled version of the compiled computation. The following two sections discuss the forms into which realistic resource estimators could evolve.
Just-in-time compiler
The classical machine controlling the execution of the quantum computer is connected to a permanent feedback loop: signals received from the quantum hardware are interpreted into error-correction syndromes, but these could also be used to dynamically adapt the computation. For example, if a distillation procedure failed, it could be restarted instead of aborting the entire computation. However, restarting a distillation implies that the contents of the shoe box are dynamically updated and, therefore, the shoe box is generated just in time (JIT). Moreover, the software tool can be transformed into a JIT compiler that predicts future execution paths to reduce shoe box volumes, which makes the representation of the shoe box more compact.
Quantum JIT compilers are still a technological speculation, but their existence could bring quantum computing closer. It may be a realistic design decision to consider that JIT resource estimation/compilation is performed in a hardware abstraction layer managed by a reliable classical runtime system similar to how classical JIT compilers integrated components of a runtime system. Such a system integrates the error-correction mechanism, too, and it is, for example, able to dynamically decide code distances of executed components. This will trigger dynamic recompilation (replacement) of the error-corrected structures (holes and patches for surface codes) on the actual quantum hardware.
JIT compilation is a highly complex problem. Thus, it is realistic to assume that the first commercially or scientifically viable quantum computer will be computationally universal but able to execute only a specific algorithm (because it is resource optimized for this problem rather than constraints on its capability). Fine-tuning all of the parameters of the error correction and the classical control mechanisms would be much easier if the computer would implement a single type of algorithm (e.g., energy optimization of a particular molecule) and do it in a resource-efficient manner. As a result, only very specific portions of the computation would be JIT. The first versions of the computer may very well not include any JIT compilation.
Operating systems
The reliability of a fault-tolerant, largescale quantum computer will be as good as the reliability of its weakest component. It would be a mistake to assume that once quantum computations are fault-tolerantly executed, the entire computer is fault tolerant ( Figure 5 ). The reliability of the classical computer controlling the quantum Q E C C T r a c k in g 
QUANTUM REALISM
machine needs to be taken into consideration. One would not like the quantum operating system equivalent of the blue screen of death when executing an algorithm.
If JIT compilers will become reality, they will be operating inside a runtime system (consider it a straightforward operating system) to control quantum hardware. The system ensures that it schedules hardware access in such a way that processes (compilation and error correction) do not reach live locks or other dangerous situations. Therefore, it may be argued that JIT compiler execution will be managed by a scheduler. For now, these are technical speculations, which should be taken into consideration when discussing realistic and practical quantum operating systems. W e refrain from making forecasts and have already mentioned some future work. A realistic resource estimator should be tailored for surface codes and assume that distillations are sequential. The estimator should have both academic and industrial applications, while being very scalable in the sense that it can estimate circuits up to tens of thousands of qubits. Reaching such a development milestone will start a new research and development race for the construction of a classical software control framework for reliable quantum computers. The time horizon for practical quantum computing is uncertain, but we are almost positive that practical resource estimators and systems to analyze the performance of theoretically designed quantum algorithms will be up and running before the first actual commercially or scientifically viable quantum computer.
