18 research outputs found

    Algorithms for Near-Term and Noisy Quantum Devices

    Get PDF
    Quantum computing promises to revolutionise many fields, including chemical simulations and machine learning. At the present moment those promises have not been realised, due to the large resource requirements of fault tolerant quantum computers, not excepting the scientific and engineering challenges to building a fault tolerant quantum computer. Instead, we currently have access to quantum devices that are both limited in qubit number, and have noisy qubits. This thesis deals with the challenges that these devices present, by investigating applications in quantum simulation for molecules and solid state systems, quantum machine learning, and by presenting a detailed simulation of a real ion trap device. We firstly build on a previous algorithm for state discrimination using a quantum machine learning model, and we show how to adapt the algorithm to work on a noisy device. This algorithm outperforms the analytical best POVM if ran on a noisy device. We then discuss how to build a quantum perceptron - the building block of a quantum neural network. We also present an algorithm for simulating the Dynamical Mean Field Theory (DMFT) using a quantum device, for two sites. We also discuss some of the difficul- ties found in scaling up that system, and present an algorithm for building the DMFT ansatz using the quantum device. We also discuss modifications to the algorithm that make it more ‘device-aware’. Finally we present a pule-level simulation of the noise in an ion trap device, designed to match the specifications of a device at the National Physical Laboratory (NPL), which we can use to direct future experimental focus. Each of these sections is preceded by a review of the relevant literature

    Design and Code Optimization for Systems with Next-generation Racetrack Memories

    Get PDF
    With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market. Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM . This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation. Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators

    Cultural Heritage on line

    Get PDF
    The 2nd International Conference "Cultural Heritage online – Empowering users: an active role for user communities" was held in Florence on 15-16 December 2009. It was organised by the Fondazione Rinascimento Digitale, the Italian Ministry for Cultural Heritage and Activities and the Library of Congress, through the National Digital Information Infrastructure and Preservation Program - NDIIP partners. The conference topics were related to digital libraries, digital preservation and the changing paradigms, focussing on user needs and expectations, analysing how to involve users and the cultural heritage community in creating and sharing digital resources. The sessions investigated also new organisational issues and roles, and cultural and economic limits from an international perspective

    A Modern Primer on Processing in Memory

    Full text link
    Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0398

    Design and Implementation of Multi-rate Multi-edge Type LDPC Codes Decoder

    Get PDF
    提出了一种固定码长的多码率多边ldPC码译码器,该译码器采用对校验比特信息进行间隔删余的算法实现其多码率译码,并设计了一种适合多码率多边ldPC码的部分并行译码结构。基于该结构在fPgA平台上实现了码长为640 bIT,码率为0.5~0.8的多边ldPC码译码器。A multi-rate Multi-edge Type LDPC(MET-LDPC) decoder is presented by puncturing the parity bits every one bit,and a suitable multi-rate MET-LDPC code partial parallel decoding structure is designed.FPGA platform uses this structure to achieve the multi-rate MET-LDPC decoder with code-length 640 bit and the code rate range from 0.5 to 0.8.国家自然科学基金项目(60972053

    Area and energy efficient VLSI architectures for low-density parity-check decoders using an on-the-fly computation

    Get PDF
    The VLSI implementation complexity of a low density parity check (LDPC) decoder is largely influenced by the interconnect and the storage requirements. This dissertation presents the decoder architectures for regular and irregular LDPC codes that provide substantial gains over existing academic and commercial implementations. Several structured properties of LDPC codes and decoding algorithms are observed and are used to construct hardware implementation with reduced processing complexity. The proposed architectures utilize an on-the-fly computation paradigm which permits scheduling of the computations in a way that the memory requirements and re-computations are reduced. Using this paradigm, the run-time configurable and multi-rate VLSI architectures for the rate compatible array LDPC codes and irregular block LDPC codes are designed. Rate compatible array codes are considered for DSL applications. Irregular block LDPC codes are proposed for IEEE 802.16e, IEEE 802.11n, and IEEE 802.20. When compared with a recent implementation of an 802.11n LDPC decoder, the proposed decoder reduces the logic complexity by 6.45x and memory complexity by 2x for a given data throughput. When compared to the latest reported multi-rate decoders, this decoder design has an area efficiency of around 5.5x and energy efficiency of 2.6x for a given data throughput. The numbers are normalized for a 180nm CMOS process. Properly designed array codes have low error floors and meet the requirements of magnetic channel and other applications which need several Gbps of data throughput. A high throughput and fixed code architecture for array LDPC codes has been designed. No modification to the code is performed as this can result in high error floors. This parallel decoder architecture has no routing congestion and is scalable for longer block lengths. When compared to the latest fixed code parallel decoders in the literature, this design has an area efficiency of around 36x and an energy efficiency of 3x for a given data throughput. Again, the numbers are normalized for a 180nm CMOS process. In summary, the design and analysis details of the proposed architectures are described in this dissertation. The results from the extensive simulation and VHDL verification on FPGA and ASIC design platforms are also presented

    Topical Workshop on Electronics for Particle Physics

    Get PDF
    The purpose of the workshop was to present results and original concepts for electronics research and development relevant to particle physics experiments as well as accelerator and beam instrumentation at future facilities; to review the status of electronics for the LHC experiments; to identify and encourage common efforts for the development of electronics; and to promote information exchange and collaboration in the relevant engineering and physics communities

    Data integrity for on-chip interconnects

    Get PDF
    With shrinking feature size and growing integration density in the Deep Sub- Micron (DSM) technologies, the global buses are fast becoming the "weakest-links" in VLSI design. They have large delays and are error-prone. Especially, in system-onchip (SoC) designs, where parallel interconnects run over large distances, they pose difficult research and design problems. This work presents an approach for evaluating the data carrying capacity of such wires. The method treats the delay and reliability in interconnects from an information theoretic perspective. The results point to an optimal frequency of operation for a given bus dimension for maximum data transfer rate. Moreover, this optimal frequency is higher than that achieved by present day designs which accommodate the worst case delays. This work also proposes several novel ways to approach this optimal data transfer rate in practical designs.From the analysis of signal propagation delay in long wires, it is seen that the signal delay distribution has a long tail, meaning that most signals arrive at the output much faster than the worst case delay. Using communication theory, these "good" signals arriving early can be used to predict/correct the "few" signals that arrive late. In addition to this correction based on prediction, the approaches use coding techniques to eliminate high delay cases to generate a higher transmission rate. The work also extends communication theoretic approaches to other areas of VLSI design. Parity groups are generated based on low output delay correlation to add redundancy in combinatorial circuits. This redundancy is used to increase the frequency of operation and/or reduce the energy consumption while improving the overall reliability of the circuit

    High throughput low power decoder architectures for low density parity check codes

    Get PDF
    A high throughput scalable decoder architecture, a tiling approach to reduce the complexity of the scalable architecture, and two low power decoding schemes have been proposed in this research. The proposed scalable design is generated from a serial architecture by scaling the combinational logic; memory partitioning and constructing a novel H matrix to make parallelization possible. The scalable architecture achieves a high throughput for higher values of the parallelization factor M. The switch logic used to route the bit nodes to the appropriate checks is an important constituent of the scalable architecture and its complexity is high with higher M. The proposed tiling approach is applied to the scalable architecture to simplify the switch logic and reduce gate complexity. The tiling approach generates patterns that are used to construct the H matrix by repeating a fixed number of those generated patterns. The advantages of the proposed approach are two-fold. First, the information stored about the H matrix is reduced by onethird. Second, the switch logic of the scalable architecture is simplified. The H matrix information is also embedded in the switch and no external memory is needed to store the H matrix. Scalable architecture and tiling approach are proposed at the architectural level of the LDPC decoder. We propose two low power decoding schemes that take advantage of the distribution of errors in the received packets. Both schemes use a hard iteration after a fixed number of soft iterations. The dynamic scheme performs X soft iterations, then a parity checker cHT that computes the number of parity checks in error. Based on cHT value, the decoder decides on performing either soft iterations or a hard iteration. The advantage of the hard iteration is so significant that the second low power scheme performs a fixed number of iterations followed by a hard iteration. To compensate the bit error rate performance, the number of soft iterations in this case is higher than that of those performed before cHT in the first scheme
    corecore