10 research outputs found

    A space-efficient quantum computer simulator suitable for high-speed FPGA implementation

    Full text link
    Conventional vector-based simulators for quantum computers are quite limited in the size of the quantum circuits they can handle, due to the worst-case exponential growth of even sparse representations of the full quantum state vector as a function of the number of quantum operations applied. However, this exponential-space requirement can be avoided by using general space-time tradeoffs long known to complexity theorists, which can be appropriately optimized for this particular problem in a way that also illustrates some interesting reformulations of quantum mechanics. In this paper, we describe the design and empirical space-time complexity measurements of a working software prototype of a quantum computer simulator that avoids excessive space requirements. Due to its space-efficiency, this design is well-suited to embedding in single-chip environments, permitting especially fast execution that avoids access latencies to main memory. We plan to prototype our design on a standard FPGA development board.Comment: 12 pages, 6 figures, presented at Quantum Information and Computation VII, Orlando, April 2009. Author reprint of final submitted manuscrip

    ASAM: Automatic Architecture Synthesis and Application Mapping,

    Get PDF
    Abstract -This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable applicationspecific instruction-set processors (ASIPs). It presents an overview of the research being currently performed in the scope of the European project ASAM (Architecture Synthesis and Application Mapping) of the ARTEMIS program. The paper briefly presents the results of our analysis of the main problems to be solved and challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to resolve the problems and address the challenges. Finally, it introduces and briefly discusses the design-flow and its main stages proposed by the ASAM project consortium to enable an effective and efficient solution of these problems. Index Terms-embedded systems, heterogeneous multiprocessor system-on-chip (MPSoC), customizable ASIPs, architecture synthesis, MPSoC and ASIP design automation

    Investigating the Potential of Custom Instruction Set Extensions for SHA-3 Candidates on a 16-bit Microcontroller Architecture

    Get PDF
    In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified processor core. We assess the gains in throughput, memory consumption, and the area overhead. Our results show that with less than 10% additional area, it is possible to increase the execution speed on average by almost 40%, while reducing memory requirements on average by more than 40%. In particular, the Grostl algorithm, which was one of the slowest algorithms in previous reference implementations, ends up being the fastest implementation by some margin, once minor (but dedicated) instruction set extensions are taken into account

    ASAM: Automatic Architecture Synthesis and Application Mapping

    Full text link
    This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an over-view of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main problems to be solved and challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to resolve the problems and address the challenges. Finally, it introduces and briefly discusses the ASAM design-flow and its main stages

    Processeurs embarqués configurables pour la reproduction de tons

    Get PDF
    RÉSUMÉ Les images à grande gamme dynamique (HDR) peuvent capturer les détails d’une scène à la fois dans les zones les plus claires et les zones ombragées, en imitant les capacités du système visuel humain. La reproduction de tons (TM) vise à adapter les images HDR aux dispositifs d’affichage traditionnels. La première partie de ce travail s’occupe d’une application des algorithmes de reproduction de tons : l’amélioration du contraste. Nous avons effectué une comparaison de plusieurs méthodes de pointe d’ajustement du contraste, y compris deux opérateurs de TM. Cette analyse comparative a été mise en oeuvre dans le contexte d’applications de surveillance lorsque les vidéos sont prises dans des conditions d’éclairage faibles. La qualité de l’image a été évaluée en utilisant des métriques objectives comme le contraste d’intensités et l’erreur de la brillance, et via une évaluation subjective. De plus, la performance a été mesurée en fonction du temps d’exécution. Les résultats expérimentaux montrent qu’une technique récente basée sur une modification de l’histogramme présente un meilleur compromis si les deux critères sont considérés. Les algorithmes de TM imposent habituellement des besoins élevés en ressources de calcul. En conséquence, ces algorithmes sont normalement implémentés sur des processeurs à usage général puissants et des processeurs graphiques. Ces plateformes ne peuvent pas toujours satisfaire les contraintes de performance, de surface, de consommation de puissance et de flexibilité imposées par le domaine des systèmes embarqués. Même si ces exigences sont souvent contradictoires, les processeurs à jeu d’instructions spécialisées (ASIP) deviennent une alternative d’implémentation intéressante. Les ASIP peuvent fournir un compromis entre l’efficacité d’une solution matérielle dédiée et la flexibilité associée à une solution logicielle programmable. La deuxième partie de ce mémoire présente la conception et l’implémentation d’un processeur spécialisé pour un algorithme global de TM. Nous avons analysé l’algorithme entier afin d’estimer les besoins en données et en calculs. Trois instructions spécialisées ont été proposées : pour calculer les valeurs de la luminance, du logarithme et de la luminance maximale. En utilisant un langage de description architecturale, les instructions spécialisées ont été ajoutées à un processeur similaire à un RISC de 32 bits. Le logarithme a été calculé à l’aide d’une technique spécifique à faible coût basée sur une approximation de Mitchell améliorée. Les résultats expérimentaux démontrent une augmentation de la performance de 169% si les trois instructions y sont rajoutées, avec un coût matériel supplémentaire de seulement 22%. Finalement, comme les algorithmes globaux de TM peuvent ne pas préserver d’importants contrastes locaux, nous avons conçu et implémenté un autre ASIP pour un algorithme local. Des instructions spécialisées pour accélérer une pyramide gaussienne modifiée ont été ajoutées à un processeur configurable et extensible, semblable à un RISC de 32 bits. Les différents niveaux de la pyramide ont été calculés en utilisant un noyau gaussien 2D unique dans un processus itératif. Les résultats montrent un facteur d’accélération de 12,3× pour le calcul de la pyramide, ce qui implique une amélioration de la performance de 50% pour l’algorithme local. Ce processeur spécialisé requiert une augmentation de la surface de 19% par rapport à la configuration de base. ---------ABSTRACT High dynamic range (HDR) images can capture the details of a scene in both highlights and shadows, imitating the capabilities of the human visual system. Tone mapping (TM) aims to adapt HDR images to conventional display devices. The first part of this work deals with an application of tone mapping algorithms: contrast enhancement. We compare several state-of-the-art contrast adjustment methods, including two TM operators. This comparative analysis was conducted in the context of surveillance applications when videos are taken in poor lighting conditions. Image quality was evaluated by means of objective metrics such as intensity contrast and brightness error, and by subjective assessment. Moreover, performance was measured based on execution time. Experimental results show that a recent technique based on histogram modification presents a better trade-off considering both aspects. TM algorithms usually impose high demands on computational resources. As a result, they are usually implemented on powerful general purpose processors and graphics processing units. Such platforms may not meet performance, area, power consumption and flexibility constraints imposed by the embedded system domain. These requirements are often contradictory, and application-specific instruction-set processors (ASIPs) become an interesting implementation alternative. ASIPs can provide a trade-off between the efficiency of a dedicated hardware solution and the flexibility associated with a software programmable solution. The second part of this master thesis presents the design and implementation of a customized processor for a global TM algorithm. We analyzed the whole algorithm to estimate the data and computational requirements. Three custom instructions were proposed: to calculate luminance, logarithm and maximum luminance values. Using an architecture description language, the custom instructions were added to a 32-bit RISC-based processor. The logarithm was computed using a specific low cost technique based on an improved Mitchell approximation. Experimental results demonstrate a 169% performance improvement when adding all three instructions, with a hardware overhead of only 22%. Finally, as global TM algorithms may not preserve important local contrasts, we designed and implemented another ASIP for a local algorithm. Custom instructions to accelerate a modified Gaussian pyramid were added to a configurable and extensible 32-bit RISC-like processor. The different pyramid levels were computed using a unique 2D Gaussian kernel in an iterative process. Results show a speedup factor of 12,3× for the pyramid computation, which implies a 50% performance improvement for the local algorithm. This customized processor requires a 19% area increase compared to the base configuration

    Efficient VLSI Architectures for Image Compression Algorithms

    Get PDF
    An image, in its original form, contains huge amount of data which demands not only large amount of memory requirements for its storage but also causes inconvenient transmission over limited bandwidth channel. Image compression reduces the data from the image in either lossless or lossy way. While lossless image compression retrieves the original image data completely, it provides very low compression. Lossy compression techniques compress the image data in variable amount depending on the quality of image required for its use in particular application area. It is performed in steps such as image transformation, quantization and entropy coding. JPEG is one of the most used image compression standard which uses discrete cosine transform (DCT) to transform the image from spatial to frequency domain. An image contains low visual information in its high frequencies for which heavy quantization can be done in order to reduce the size in the transformed representation. Entropy coding follows to further reduce the redundancy in the transformed and quantized image data. Real-time data processing requires high speed which makes dedicated hardware implementation most preferred choice. The hardware of a system is favored by its lowcost and low-power implementation. These two factors are also the most important requirements for the portable devices running on battery such as digital camera. Image transform requires very high computations and complete image compression system is realized through various intermediate steps between transform and final bit-streams. Intermediate stages require memory to store intermediate results. The cost and power of the design can be reduced both in efficient implementation of transforms and reduction/removal of intermediate stages by employing different techniques. The proposed research work is focused on the efficient hardware implementation of transform based image compression algorithms by optimizing the architecture of the system. Distribute arithmetic (DA) is an efficient approach to implement digital signal processing algorithms. DA is realized by two different ways, one through storage of precomputed values in ROMs and another without ROM requirements. ROM free DA is more efficient. For the image transform, architectures of one dimensional discrete Hartley transform (1-D DHT) and one dimensional DCT (1-D DCT) have been optimized using ROM free DA technique. Further, 2-D separable DHT (SDHT) and 2-D DCT architectures have been implemented in row-column approach using two 1-D DHT and two 1-D DCT respectively. A finite state machine (FSM) based architecture from DCT to quantization has been proposed using the modified quantization matrix in JPEG image compression which requires no memory in storage of quantization table and DCT coefficients. In addition, quantization is realized without use of multipliers that require more area and are power hungry. For the entropy encoding, Huffman coding is hardware efficient than arithmetic coding. The use of Huffman code table further simplifies the implementation. The strategies have been used for the significant reduction of memory bits in storage of Huffman code table and the complete Huffman coding architecture encodes the transformed coefficients one bit per clock cycle. Direct implementation algorithm of DCT has the advantage that it is free of transposition memory to store intermediate 1-D DCT. Although recursive algorithms have been a preferred method, these algorithms have low accuracy resulting in image quality degradation. A non-recursive equation for the direct computation of DCT coefficients have been proposed and implemented in both 0.18 µm ASIC library as well as FPGA. It can compute DCT coefficients in any order and all intermediate computations are free of fractions and hence very high image quality has been obtained in terms of PSNR. In addition, one multiplier and one register bit-width need to be changed for increasing the accuracy resulting in very low hardware overhead. The architecture implementation has been done to obtain zig-zag ordered DCT coefficients. The comparison results show that this implementation has less area in terms of gate counts and less power consumption than the existing DCT implementations. Using this architecture, the complete JPEG image compression system has been implemented which has Huffman coding module, one multiplier and one register as the only additional modules. The intermediate stages (DCT to Huffman encoding) are free of memory, hence efficient architecture is obtained

    Методика работы с текстами при изучении иностранного языка для профессиональных целей

    Get PDF
    В теоретической части ВКР характеризуются научно-технический текст, научная статья, а также виды профессионально-ориентированного чтения, актуальные в контексте междисциплинарного проекта, реализуемого на радиотехнических направлениях в техническом вузе. В практической части исследования описывается разработанная методика взаимодействия с научно-профессиональным текстом, основу которой составляет комплекс упражнений для развития конкретных умений работы с текстом, а также приводятся результаты экспериментальной проверки данной методик
    corecore