507 research outputs found
Towards a GPU SDN controller
Abstract—The SDN concept of separating and centralizing the control plane from the data plane has provided more flexibility and programmability to the deployment of the networks. On the other hand, the separation of the planes has raised some scala-bility and performance questions, being that the SDN controller is the bottleneck. In this paper we present an implementation of a GPU SDN controller. The goal of this paper is to mitigate the scalability problem of the SDN controller by offloading all the packet inspection and creation to the GPU. Experimental evaluation shows that the controller is able to process 17 Million flows/s in the worst case scenario using just off-the-shelf GPU’s. I
A Synergistic Compilation Workflow for Tackling Crosstalk in Quantum Machines
Near-term quantum systems tend to be noisy. Crosstalk noise has been
recognized as one of several major types of noises in superconducting Noisy
Intermediate-Scale Quantum (NISQ) devices. Crosstalk arises from the concurrent
execution of two-qubit gates on nearby qubits, such as \texttt{CX}. It might
significantly raise the error rate of gates in comparison to running them
individually. Crosstalk can be mitigated through scheduling or hardware machine
tuning. Prior scientific studies, however, manage crosstalk at a really late
phase in the compilation process, usually after hardware mapping is done. It
may miss great opportunities of optimizing algorithm logic, routing, and
crosstalk at the same time. In this paper, we push the envelope by considering
all these factors simultaneously at the very early compilation stage. We
propose a crosstalk-aware quantum program compilation framework called CQC that
can enhance crosstalk mitigation while achieving satisfactory circuit depth.
Moreover, we identify opportunities for translation from intermediate
representation to the circuit for application-specific crosstalk mitigation,
for instance, the \texttt{CX} ladder construction in variational quantum
eigensolvers (VQE). Evaluations through simulation and on real IBM-Q devices
show that our framework can significantly reduce the error rate by up to
6, with only 60\% circuit depth compared to state-of-the-art gate
scheduling approaches. In particular, for VQE, we demonstrate 49\% circuit
depth reduction with 9.6\% fidelity improvement over prior art on the H4
molecule using IBMQ Guadalupe. Our CQC framework will be released on GitHub
A Structured Method for Compilation of QAOA Circuits in Quantum Computing
Quantum Approximation Optimization Algorithm (QAOA) is a highly advocated
variational algorithm for solving the combinatorial optimization problem. One
critical feature in the quantum circuit of QAOA algorithm is that it consists
of two-qubit operators that commute. The flexibility in reordering the
two-qubit gates allows compiler optimizations to generate circuits with better
depths, gate count, and fidelity. However, it also imposes significant
challenges due to additional freedom exposed in the compilation. Prior studies
lack the following: (1) Performance guarantee, (2) Scalability, and (3)
Awareness of regularity in scalable hardware. We propose a structured method
that ensures linear depth for any compiled QAOA circuit on multi-dimensional
quantum architectures. We also demonstrate how our method runs on Google
Sycamore and IBM Non-linear architectures in a scalable manner and in linear
time. Overall, we can compile a circuit with up to 1024 qubits in 10 seconds
with a 3.8X speedup in depth, 17% reduction in gate count, and 18X improvement
for circuit ESP.Comment: 11 pages, 22 figure
Tetris: A compilation Framework for VQE Applications
Quantum computing has shown promise in solving complex problems by leveraging
the principles of superposition and entanglement. The Variational Quantum
Eigensolver (VQE) algorithm stands as a pivotal approach in the realm of
quantum algorithms, enabling the simulation of quantum systems on quantum
hardware. In this paper, we introduce two innovative techniques, namely
"Tetris" and "Fast Bridging," designed to enhance the efficiency and
effectiveness of VQE tasks. The "Tetris" technique addresses a crucial aspect
of VQE optimization by unveiling cancellation opportunities within the logical
circuit phase of UCCSD ansatz. Tetris demonstrates a remarkable reduction up to
20% in CNOT gate counts, about 119048 CNOT gates, and 30% depth reduction
compared to the state-of-the-art compiler 'Paulihedral'. In addition to Tetris,
we present the "Fast Bridging" technique as an alternative to the conventional
qubit routing methods that heavily rely on swap operations. The fast bridging
offers a novel approach to qubit routing, mitigating the limitations associated
with swap-heavy routing. By integrating the fast bridging into the VQE
framework, we observe further reductions in CNOT gate counts and circuit depth.
The bridging technique can achieve up to 27% CNOT gate reduction in the QAOA
application. Through a combination of Tetris and the fast bridging, we present
a comprehensive strategy for enhancing VQE performance. Our experimental
results showcase the effectiveness of Tetris in uncovering cancellation
opportunities and demonstrate the symbiotic relationship between Tetris and the
fast bridging in minimizing gate counts and circuit depth. This paper
contributes not only to the advancement of VQE techniques but also to the
broader field of quantum algorithm optimization
New-Sum: A Novel Online ABFT Scheme for General Iterative Methods
Emerging high-performance computing platforms, with large component counts and lower power margins, are anticipated to be more susceptible to soft errors in both logic circuits and memory subsystems. We present an online algorithm-based fault tolerance (ABFT) approach to efficiently detect and recover soft errors for general iterative methods. We design a novel checksum-based encoding scheme for matrix-vector multiplication that is resilient to both arithmetic and memory errors. Our design decouples the checksum updating process from the actual computation, and allows adaptive checksum overhead control. Building on this new encoding mechanism, we propose two online ABFT designs that can effectively recover from errors when combined with a checkpoint/rollback scheme. These designs are capable of addressing scenarios under different error rates. Our ABFT approaches apply to a wide range of iterative solvers that primarily rely on matrix-vector multiplication and vector linear operations. We evaluate our designs through comprehensive analytical and empirical analysis. Experimental evaluation on the Stampede supercomputer demonstrates the low performance overheads incurred by our two ABFT schemes for preconditioned CG (0:4% and 2:2%) and preconditioned BiCGSTAB (1:0% and 4:0%) for the largest SPD matrix from UFL Sparse Matrix Collection. The evaluation also demonstrates the exibility and effectiveness of our proposed designs for detecting and recovering various types of soft errors in general iterative methods
QASMTrans: A QASM based Quantum Transpiler Framework for NISQ Devices
The success of a quantum algorithm hinges on the ability to orchestrate a
successful application induction. Detrimental overheads in mapping general
quantum circuits to physically implementable routines can be the deciding
factor between a successful and erroneous circuit induction. In QASMTrans, we
focus on the problem of rapid circuit transpilation. Transpilation plays a
crucial role in converting high-level, machine-agnostic circuits into
machine-specific circuits constrained by physical topology and supported gate
sets. The efficiency of transpilation continues to be a substantial bottleneck,
especially when dealing with larger circuits requiring high degrees of
inter-qubit interaction. QASMTrans is a high-performance C++ quantum transpiler
framework that demonstrates up to 369X speedups compared to the commonly used
Qiskit transpiler. We observe speedups on large dense circuits such as
uccsd_n24 and qft_n320 which require O(10^6) gates. QASMTrans successfully
transpiles the aforementioned circuits in 69s and 31s, whilst Qiskit exceeded
an hour of transpilation time. With QASMTrans providing transpiled circuits in
a fraction of the time of prior transpilers, potential design space
exploration, and heuristic-based transpiler design becomes substantially more
tractable. QASMTrans is released at http://github.com/pnnl/qasmtrans
Spag16, an Axonemal Central Apparatus Gene, Encodes a Male Germ Cell Nuclear Speckle Protein that Regulates SPAG16 mRNA Expression
Spag16 is the murine orthologue of Chlamydomonas reinhardtii PF20, a protein known to be essential to the structure and function of the “9+2” axoneme. In Chlamydomonas, the PF20 gene encodes a single protein present in the central pair of the axoneme. Loss of PF20 prevents central pair assembly/integrity and results in flagellar paralysis. Here we demonstrate that the murine Spag16 gene encodes two proteins: 71 kDa SPAG16L, which is found in all murine cells with motile cilia or flagella, and 35 kDa SPAG16S, representing the C terminus of SPAG16L, which is expressed only in male germ cells, and is predominantly found in specific regions within the nucleus that also contain SC35, a known marker of nuclear speckles enriched in pre-mRNA splicing factors. SPAG16S expression precedes expression of SPAG16L. Mice homozygous for a knockout of SPAG16L alone are infertile, but show no abnormalities in spermatogenesis. Mice chimeric for a mutation deleting the transcripts for both SPAG16L and SPAG16S have a profound defect in spermatogenesis. We show here that transduction of SPAG16S into cultured dispersed mouse male germ cells and BEAS-2B human bronchial epithelial cells increases SPAG16L expression, but has no effect on the expression of several other axoneme components. We also demonstrate that the Spag16L promoter shows increased activity in the presence of SPAG16S. The distinct nuclear localization of SPAG16S and its ability to modulate Spag16L mRNA expression suggest that SPAG16S plays an important role in the gene expression machinery of male germ cells. This is a unique example of a highly conserved axonemal protein gene that encodes two protein products with different functions
Universal architecture of bacterial chemoreceptor arrays
Chemoreceptors are key components of the high-performance signal transduction system that controls bacterial chemotaxis. Chemoreceptors are typically localized in a cluster at the cell pole, where interactions among the receptors in the cluster are thought to contribute to the high sensitivity, wide dynamic range, and precise adaptation of the signaling system. Previous structural and genomic studies have produced conflicting models, however, for the arrangement of the chemoreceptors in the clusters. Using whole-cell electron cryo-tomography, here we show that chemoreceptors of different classes and in many different species representing several major bacterial phyla are all arranged into a highly conserved, 12-nm hexagonal array consistent with the proposed “trimer of dimers” organization. The various observed lengths of the receptors confirm current models for the methylation, flexible bundle, signaling, and linker sub-domains in vivo. Our results suggest that the basic mechanism and function of receptor clustering is universal among bacterial species and was thus conserved during evolution
- …