Search CORE

6 research outputs found

An Enhanced Multiway Sorting Network Based on n-Sorters

Author: Shi Feng
Wagh Meghanad
Yan Zhiyuan
Publication venue
Publication date: 03/07/2014
Field of study

Merging-based sorting networks are an important family of sorting networks. Most merge sorting networks are based on 2-way or multi-way merging algorithms using 2-sorters as basic building blocks. An alternative is to use n-sorters, instead of 2-sorters, as the basic building blocks so as to greatly reduce the number of sorters as well as the latency. Based on a modified Leighton's columnsort algorithm, an n-way merging algorithm, referred to as SS-Mk, that uses n-sorters as basic building blocks was proposed. In this work, we first propose a new multiway merging algorithm with n-sorters as basic building blocks that merges n sorted lists of m values each in 1 + ceil(m/2) stages (n <= m). Based on our merging algorithm, we also propose a sorting algorithm, which requires O(N log2 N) basic sorters to sort N inputs. While the asymptotic complexity (in terms of the required number of sorters) of our sorting algorithm is the same as the SS-Mk, for wide ranges of N, our algorithm requires fewer sorters than the SS-Mk. Finally, we consider a binary sorting network, where the basic sorter is implemented in threshold logic and scales linearly with the number of inputs, and compare the complexity in terms of the required number of gates. For wide ranges of N, our algorithm requires fewer gates than the SS-Mk.Comment: 13 pages, 14 figure

arXiv.org e-Print Archive

Crossref

Improved techniques for preparing eigenstates of fermionic Hamiltonians

Author: Babbush R
Berry DW
Gidney C
Kieferová M
Low GH
Sanders YR
Scherer A
Wiebe N
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2022
Field of study

Modeling low energy eigenstates of fermionic systems can provide insight into chemical reactions and material properties and is one of the most anticipated applications of quantum computing. We present three techniques for reducing the cost of preparing fermionic Hamiltonian eigenstates using phase estimation. First, we report a polylogarithmic-depth quantum algorithm for antisymmetrizing the initial states required for simulation of fermions in first quantization. This is an exponential improvement over the previous state-of-the-art. Next, we show how to reduce the overhead due to repeated state preparation in phase estimation when the goal is to prepare the ground state to high precision and one has knowledge of an upper bound on the ground state energy that is less than the excited state energy (often the case in quantum chemistry). Finally, we explain how one can perform the time evolution necessary for the phase estimation based preparation of Hamiltonian eigenstates with exactly zero error by using the recently introduced qubitization procedure

OPUS - University of Technology Sydney

Adaptation of multiway-merge sorting algorithm to MIMD architectures with an experimental study

Author: Cantürk Levent
Publication venue: Bilkent University
Publication date: 01/01/2002
Field of study

Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2002.Thesis (Master's) -- Bilkent University, 2002.Includes bibliographical references leaves 73-78.Sorting is perhaps one of the most widely studied problems of computing. Numerous asymptotically optimal sequential algorithms have been discovered. Asymptotically optimal algorithms have been presented for varying parallel models as well. Parallel sorting algorithms have already been proposed for a variety of multiple instruction, multiple data streams (MIMD) architectures. In this thesis, we adapt the multiwaymerge sorting algorithm that is originally designed for product networks, to MIMD architectures. It has good load balancing properties, modest communication needs and well performance. The multiway-merge sort algorithm requires only two all-to-all personalized communication (AAPC) and two one-to-one communications independent from the input size. In addition to evenly distributed load balancing, the algorithm requires only size of 2N/P local memory for each processor in the worst case, where N is the number of items to be sorted and P is the number of processors. We have implemented the algorithm on the PC Cluster that is established at Computer Engineering Department of Bilkent University. To compare the results we have implemented a sample sort algorithm (PSRS Parallel Sorting by Regular Sampling) by X. Liu et all and a parallel quicksort algorithm (HyperQuickSort) on the same cluster. In the experimental studies we have used three different benchmarks namely Uniformly, Gaussian, and Zero distributed inputs. Although the multiwaymerge algorithm did not achieve better results than the other two, which are theoretically cost optimal algorithms, there are some cases that the multiway-merge algorithm outperforms the other two like in Zero distributed input. The results of the experiments are reported in detail. The multiway-merge sort algorithm is not necessarily the best parallel sorting algorithm, but it is expected to achieve acceptable performance on a wide spectrum of MIMD architectures.Cantürk, LeventM.S

Bilkent University Institutional Repository

SIGNAL PROCESSING TECHNIQUES AND APPLICATIONS

Author: Shi Feng
Publication venue: Lehigh Preserve
Publication date
Field of study

As the technologies scaling down, more transistors can be fabricated into the same area, which enables the integration of many components into the same substrate, referred to as system-on-chip (SoC). The components on SoC are connected by on-chip global interconnects. It has been shown in the recent International Technology Roadmap of Semiconductors (ITRS) that when scaling down, gate delay decreases, but global interconnect delay increases due to crosstalk. The interconnect delay has become a bottleneck of the overall system performance. Many techniques have been proposed to address crosstalk, such as shielding, buffer insertion, and crosstalk avoidance codes (CACs). The CAC is a promising technique due to its good crosstalk reduction, less power consumption and lower area. In this dissertation, I will present analytical delay models for on-chip interconnects with improved accuracy. This enables us to have a more accurate control of delays for transition patterns and lead to a more efficient CAC, whose worst-case delay is 30-40% smaller than the best of previously proposed CACs. As the clock frequency approaches multi-gigahertz, the parasitic inductance of on-chip interconnects has become significant and its detrimental effects, including increased delay, voltage overshoots and undershoots, and increased crosstalk noise, cannot be ignored. We introduce new CACs to address both capacitive and inductive couplings simultaneously.Quantum computers are more powerful in solving some NP problems than the classical computers. However, quantum computers suffer greatly from unwanted interactions with environment. Quantum error correction codes (QECCs) are needed to protect quantum information against noise and decoherence. Given their good error-correcting performance, it is desirable to adapt existing iterative decoding algorithms of LDPC codes to obtain LDPC-based QECCs. Several QECCs based on nonbinary LDPC codes have been proposed with a much better error-correcting performance than existing quantum codes over a qubit channel. In this dissertation, I will present stabilizer codes based on nonbinary QC-LDPC codes for qubit channels. The results will confirm the observation that QECCs based on nonbinary LDPC codes appear to achieve better performance than QECCs based on binary LDPC codes.As the technologies scaling down further to nanoscale, CMOS devices suffer greatly from the quantum mechanical effects. Some emerging nano devices, such as resonant tunneling diodes (RTDs), quantum cellular automata (QCA), and single electron transistors (SETs), have no such issues and are promising candidates to replace the traditional CMOS devices. Threshold gate, which can implement complex Boolean functions within a single gate, can be easily realized with these devices. Several applications dealing with real-valued signals have already been realized using nanotechnology based threshold gates. Unfortunately, the applications using finite fields, such as error correcting coding and cryptography, have not been realized using nanotechnology. The main obstacle is that they require a great number of exclusive-ORs (XORs), which cannot be realized in a single threshold gate. Besides, the fan-in of a threshold gate in RTD nanotechnology needs to be bounded for both reliability and performance purpose. In this dissertation, I will present a majority-class threshold architecture of XORs with bounded fan-in, and compare it with a Boolean-class architecture. I will show an application of the proposed XORs for the finite field multiplications. The analysis results will show that the majority class outperforms the Boolean class architectures in terms of hardware complexity and latency. I will also introduce a sort-and-search algorithm, which can be used for implementations of any symmetric functions. Since XOR is a special symmetric function, it can be implemented via the sort-and-search algorithm. To leverage the power of multi-input threshold functions, I generalize the previously proposed sort-and-search algorithm from a fan-in of two to arbitrary fan-ins, and propose an architecture of multi-input XORs with bounded fan-ins

Lehigh University: Lehigh Preserve

Quantum Algorithmic Techniques for Fault-Tolerant Quantum Computers

Author: Kieferova Maria
Publication venue: 'University of Waterloo'
Publication date: 23/09/2019
Field of study

Quantum computers have the potential to push the limits of computation in areas such as quantum chemistry, cryptography, optimization, and machine learning. Even though many quantum algorithms show asymptotic improvement compared to classical ones, the overhead of running quantum computers limits when quantum computing becomes useful. Thus, by optimizing components of quantum algorithms, we can bring the regime of quantum advantage closer. My work focuses on developing efficient subroutines for quantum computation. I focus specifically on algorithms for scalable, fault-tolerant quantum computers. While it is possible that even noisy quantum computers can outperform classical ones for specific tasks, high-depth and therefore fault-tolerance is likely required for most applications. In this thesis, I introduce three sets of techniques that can be used by themselves or as subroutines in other algorithms. The first components are coherent versions of classical sort and shuffle. We require that a quantum shuffle prepares a uniform superposition over all permutations of a sequence. The quantum sort is used within the shuffle and as well as in the next algorithm in this thesis. The quantum shuffle is an essential part of state preparation for quantum chemistry computation in first quantization. Second, I review the progress of Hamiltonian simulations and give a new algorithm for simulating time-dependent Hamiltonians. This algorithm scales polylogarithmic in the inverse error, and the query complexity does not depend on the derivatives of the Hamiltonian. A time-dependent Hamiltonian simulation was recently used for interaction picture simulation with applications to quantum chemistry. Next, I present a fully quantum Boltzmann machine. I show that our algorithm can train on quantum data and learn a classical description of quantum states. This type of machine learning can be used for tomography, Hamiltonian learning, and approximate quantum cloning

University of Waterloo's Institutional Repository

Macquarie University ResearchOnline

A Generalized Bitonic Sorting Network

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref