192 research outputs found

    Cyclotomic Identity Testing and Applications

    Full text link
    We consider the cyclotomic identity testing problem: given a polynomial f(x1,…,xk)f(x_1,\ldots,x_k), decide whether f(ζne1,…,ζnek)f(\zeta_n^{e_1},\ldots,\zeta_n^{e_k}) is zero, for ζn=e2πi/n\zeta_n = e^{2\pi i/n} a primitive complex nn-th root of unity and integers e1,…,eke_1,\ldots,e_k. We assume that nn and e1,…,eke_1,\ldots,e_k are represented in binary and consider several versions of the problem, according to the representation of ff. For the case that ff is given by an algebraic circuit we give a randomized polynomial-time algorithm with two-sided errors, showing that the problem lies in BPP. In case ff is given by a circuit of polynomially bounded syntactic degree, we give a randomized algorithm with two-sided errors that runs in poly-logarithmic parallel time, showing that the problem lies in BPNC. In case ff is given by a depth-2 ΣΠ\Sigma\Pi circuit (or, equivalently, as a list of monomials), we show that the cyclotomic identity testing problem lies in NC. Under the generalised Riemann hypothesis, we are able to extend this approach to obtain a polynomial-time algorithm also for a very simple subclass of depth-3 ΣΠΣ\Sigma\Pi\Sigma circuits. We complement this last result by showing that for a more general class of depth-3 ΣΠΣ\Sigma\Pi\Sigma circuits, a polynomial-time algorithm for the cyclotomic identity testing problem would yield a sub-exponential-time algorithm for polynomial identity testing. Finally, we use cyclotomic identity testing to give a new proof that equality of compressed strings, i.e., strings presented using context-free grammars, can be decided in coRNC: randomized NC with one-sided errors

    Optimisations arithmétiques et synthèse de haut niveau

    Get PDF
    High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming.However, due to their relatively young nature, they still lack many arithmetic optimizations.This thesis proposes safe arithmetic optimizations that should always be applied.These optimizations are simple operator specializations, following the C semantic.Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio.To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated.In some cases, there is not enough information about the input data to tailor the fixed-point accumulator.The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range.This thesis explores different strategies for implementing such a large accumulator, including new ones.The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay.Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format.A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators.Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools.This library implements a strongly typed custom size integer type along side a set of optimized custom operators.À cause de la nature relativement jeune des outils de synthèse de haut-niveau (HLS), de nombreuses optimisations arithmétiques n'y sont pas encore implémentées. Cette thèse propose des optimisations arithmétiques se servant du contexte spécifique dans lequel les opérateurs sont instanciés.Certaines optimisations sont de simples spécialisations d'opérateurs, respectant la sémantique du C.D'autres nécéssitent de s'éloigner de cette sémantique pour améliorer le compromis précision/coût/performance.Cette proposition est démontré sur des sommes de produits de nombres flottants.La somme est réalisée dans un format en virgule-fixe défini par son contexte.Quand trop peu d’informations sont disponibles pour définir ce format en virgule-fixe, une stratégie est de générer un accumulateur couvrant l'intégralité du format flottant.Cette thèse explore plusieurs implémentations d'un tel accumulateur.L'utilisation d'une représentation en complément à deux permet de réduire le chemin critique de la boucle d'accumulation, ainsi que la quantité de ressources utilisées. Un format alternatif aux nombres flottants, appelé posit, propose d'utiliser un encodage à précision variable.De plus, ce format est augmenté par un accumulateur exact.Pour évaluer précisément le coût matériel de ce format, cette thèse présente des architectures d'opérateurs posits, implémentés avec le même degré d'optimisation que celui de l'état de l'art des opérateurs flottants.Une analyse détaillée montre que le coût des opérateurs posits est malgré tout bien plus élevé que celui de leurs équivalents flottants.Enfin, cette thèse présente une couche de compatibilité entre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothèque implémente un type d'entiers de taille variable, avec de plus une sémantique strictement typée, ainsi qu'un ensemble d'opérateurs ad-hoc optimisés

    Design for Implementation of Image Processing Algorithms

    Get PDF
    Color image processing algorithms are first developed using a high-level mathematical modeling language. Current integrated development environments offer libraries of intrinsic functions, which on one hand enable faster development, but on the other hand hide the use of fundamental operations. The latter have to be detailed for an efficient hardware and/or software physical implementation. Based on the experience accumulated in the process of implementing a segmentation algorithm, this thesis outlines a design for implementation methodology comprised of a development flow and associated guidelines. The methodology enables algorithm developers to iteratively optimize their algorithms while maintaining the level of image integrity required by their application. Furthermore, it does not require algorithm developers to change their current development process. Rather, the design for implementation methodology is best suited for optimizing a functionally correct algorithm, thus appending to an algorithm developer\u27s design process of choice. The application of this methodology to four segmentation algorithm steps produced measured results with 2-D correlation coefficients (CORR2) better than 0.99, peak-signal-to-noise-ratio (PSNR) better than 70 dB, and structural-similarity-index (SSIM) better than 0.98, for a majority of test cases

    Simple optimizing JIT compilation of higher-order dynamic programming languages

    Get PDF
    Implémenter efficacement les langages de programmation dynamiques demande beaucoup d’effort de développement. Les compilateurs ne cessent de devenir de plus en plus complexes. Aujourd’hui, ils incluent souvent une phase d’interprétation, plusieurs phases de compilation, plusieurs représentations intermédiaires et des analyses de code. Toutes ces techniques permettent d’implémenter efficacement un langage de programmation dynamique, mais leur mise en oeuvre est difficile dans un contexte où les ressources de développement sont limitées. Nous proposons une nouvelle approche et de nouvelles techniques dynamiques permettant de développer des compilateurs performants pour les langages dynamiques avec de relativement bonnes performances et un faible effort de développement. Nous présentons une approche simple de compilation à la volée qui permet d’implémenter un langage en une seule phase de compilation, sans transformation vers des représentations intermédiaires. Nous expliquons comment le versionnement de blocs de base, une technique de compilation existante, peut être étendue, sans effort de développement significatif, pour fonctionner interprocéduralement avec les langages de programmation d’ordre supérieur, permettant d’appliquer des optimisations interprocédurales sur ces langages. Nous expliquons également comment le versionnement de blocs de base permet de supprimer certaines opérations utilisées pour implémenter les langages dynamiques et qui impactent les performances comme les vérifications de type. Nous expliquons aussi comment les compilateurs peuvent exploiter les représentations dynamiques des valeurs par Tagging et NaN-boxing pour optimiser le code généré avec peu d’effort de développement. Nous présentons également notre expérience de développement d’un compilateur à la volée pour le langage de programmation Scheme, pour montrer que ces techniques permettent effectivement de construire un compilateur avec un effort moins important que les compilateurs actuels et qu’elles permettent de générer du code efficace, qui rivalise avec les meilleures implémentations du langage Scheme.Efficiently implementing dynamic programming languages requires a significant development effort. Over the years, compilers have become more complex. Today, they typically include an interpretation phase, several compilation phases, several intermediate representations and code analyses. These techniques allow efficiently implementing these programming languages but are difficult to implement in contexts in which development resources are limited. We propose a new approach and new techniques to build optimizing just-in-time compilers for dynamic languages with relatively good performance and low development effort. We present a simple just-in-time compilation approach to implement a language with a single compilation phase, without the need to use code transformations to intermediate representations. We explain how basic block versioning, an existing compilation technique, can be extended without significant development effort, to work interprocedurally with higherorder programming languages allowing interprocedural optimizations on these languages. We also explain how basic block versioning allows removing operations used to implement dynamic languages that degrade performance, such as type checks, and how compilers can use Tagging and NaN-boxing to optimize the generated code with low development effort. We present our experience of building a JIT compiler using these techniques for the Scheme programming language to show that they indeed allow building compilers with less development effort than other implementations and that they allow generating efficient code that competes with current mature implementations of the Scheme language

    Performance Evaluation of Optimal Ate Pairing on Low-Cost Single Microprocessor Platform

    Get PDF
    The framework of low-cost interconnected devices forms a new kind of cryptographic environment with diverse requirements. Due to the minimal resource capacity of the devices, light-weight cryptographic algorithms are favored. Many applications of IoT work autonomously and process sensible data, which emphasizes security needs, and might also cause a need for specific security measures. A bilinear pairing is a mapping based on groups formed by elliptic curves over extension fields. The pairings are the key-enabler for versatile cryptosystems, such as certificateless signatures and searchable encryption. However, they have a major computational overhead, which coincides with the requirements of the low-cost devices. Nonetheless, the bilinear pairings are the only known approach for many cryptographic protocols so their feasibility should certainly be studied, as they might turn out to be necessary for some future IoT solutions. Promising results already exist for high-frequency CPU:s and platforms with hardware extensions. In this work, we study the feasibility of computing the optimal ate pairing over the BN254 curve, on a 64 MHz Cortex-M33 based platform by utilizing an optimized open-source library. The project is carried out for the company Nordic Semiconductor. As a result, the pairing was effectively computed in under 26* 10^6 cycles, or in 410 ms. The resulting pairing enables a limited usage of pairing-based cryptography, with a capacity of at most few cryptographic operations, such as ID-based key verifications per second. Referring to other relevant works, a competent pairing application would require either a high-frequency - and thus high consuming - microprocessor, or a customized FPGA. Moreover, it is noted that the research in efficient pairing-based cryptography is constantly taking steps forward in every front-line: efficient algorithms, protocols, and hardware-solutions

    The Mordell-Weil sieve: Proving non-existence of rational points on curves

    Full text link
    We discuss the Mordell-Weil sieve as a general technique for proving results concerning rational points on a given curve. In the special case of curves of genus 2, we describe quite explicitly how the relevant local information can be obtained if one does not want to restrict to mod p information at primes of good reduction. We describe our implementation of the Mordell-Weil sieve algorithm and discuss its efficiency.Comment: 47 pages, 4 figures. Revised version, following suggestions/questions by the referee. Main changes: (1) We fixed a gap in the proof of Lemma 4.3 and added a discussion of the case r < g-1. (2) We added a heuristic discussion of the success probability at a single prime in Section
    • …
    corecore