192 research outputs found
Cyclotomic Identity Testing and Applications
We consider the cyclotomic identity testing problem: given a polynomial
, decide whether is
zero, for a primitive complex -th root of unity and
integers . We assume that and are
represented in binary and consider several versions of the problem, according
to the representation of . For the case that is given by an algebraic
circuit we give a randomized polynomial-time algorithm with two-sided errors,
showing that the problem lies in BPP. In case is given by a circuit of
polynomially bounded syntactic degree, we give a randomized algorithm with
two-sided errors that runs in poly-logarithmic parallel time, showing that the
problem lies in BPNC. In case is given by a depth-2 circuit
(or, equivalently, as a list of monomials), we show that the cyclotomic
identity testing problem lies in NC. Under the generalised Riemann hypothesis,
we are able to extend this approach to obtain a polynomial-time algorithm also
for a very simple subclass of depth-3 circuits. We complement
this last result by showing that for a more general class of depth-3
circuits, a polynomial-time algorithm for the cyclotomic
identity testing problem would yield a sub-exponential-time algorithm for
polynomial identity testing. Finally, we use cyclotomic identity testing to
give a new proof that equality of compressed strings, i.e., strings presented
using context-free grammars, can be decided in coRNC: randomized NC with
one-sided errors
Optimisations arithmétiques et synthèse de haut niveau
High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming.However, due to their relatively young nature, they still lack many arithmetic optimizations.This thesis proposes safe arithmetic optimizations that should always be applied.These optimizations are simple operator specializations, following the C semantic.Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio.To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated.In some cases, there is not enough information about the input data to tailor the fixed-point accumulator.The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range.This thesis explores different strategies for implementing such a large accumulator, including new ones.The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay.Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format.A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators.Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools.This library implements a strongly typed custom size integer type along side a set of optimized custom operators.À cause de la nature relativement jeune des outils de synthèse de haut-niveau (HLS), de nombreuses optimisations arithmétiques n'y sont pas encore implémentées. Cette thèse propose des optimisations arithmétiques se servant du contexte spécifique dans lequel les opérateurs sont instanciés.Certaines optimisations sont de simples spécialisations d'opérateurs, respectant la sémantique du C.D'autres nécéssitent de s'éloigner de cette sémantique pour améliorer le compromis précision/coût/performance.Cette proposition est démontré sur des sommes de produits de nombres flottants.La somme est réalisée dans un format en virgule-fixe défini par son contexte.Quand trop peu d’informations sont disponibles pour définir ce format en virgule-fixe, une stratégie est de générer un accumulateur couvrant l'intégralité du format flottant.Cette thèse explore plusieurs implémentations d'un tel accumulateur.L'utilisation d'une représentation en complément à deux permet de réduire le chemin critique de la boucle d'accumulation, ainsi que la quantité de ressources utilisées. Un format alternatif aux nombres flottants, appelé posit, propose d'utiliser un encodage à précision variable.De plus, ce format est augmenté par un accumulateur exact.Pour évaluer précisément le coût matériel de ce format, cette thèse présente des architectures d'opérateurs posits, implémentés avec le même degré d'optimisation que celui de l'état de l'art des opérateurs flottants.Une analyse détaillée montre que le coût des opérateurs posits est malgré tout bien plus élevé que celui de leurs équivalents flottants.Enfin, cette thèse présente une couche de compatibilité entre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothèque implémente un type d'entiers de taille variable, avec de plus une sémantique strictement typée, ainsi qu'un ensemble d'opérateurs ad-hoc optimisés
Design for Implementation of Image Processing Algorithms
Color image processing algorithms are first developed using a high-level mathematical modeling language. Current integrated development environments offer libraries of intrinsic functions, which on one hand enable faster development, but on the other hand hide the use of fundamental operations. The latter have to be detailed for an efficient hardware and/or software physical implementation. Based on the experience accumulated in the process of implementing a segmentation algorithm, this thesis outlines a design for implementation methodology comprised of a development flow and associated guidelines.
The methodology enables algorithm developers to iteratively optimize their algorithms while maintaining the level of image integrity required by their application. Furthermore, it does not require algorithm developers to change their current development process. Rather, the design for implementation methodology is best suited for optimizing a functionally correct algorithm, thus appending to an algorithm developer\u27s design process of choice.
The application of this methodology to four segmentation algorithm steps produced measured results with 2-D correlation coefficients (CORR2) better than 0.99, peak-signal-to-noise-ratio (PSNR) better than 70 dB, and structural-similarity-index (SSIM) better than 0.98, for a majority of test cases
Simple optimizing JIT compilation of higher-order dynamic programming languages
Implémenter efficacement les langages de programmation dynamiques demande beaucoup d’effort de développement.
Les compilateurs ne cessent de devenir de plus en plus complexes.
Aujourd’hui, ils incluent souvent une phase d’interprétation, plusieurs phases de compilation, plusieurs représentations intermédiaires et des analyses de code. Toutes ces techniques permettent d’implémenter efficacement un langage de programmation dynamique, mais leur mise en oeuvre est difficile dans un contexte où les ressources de développement sont limitées.
Nous proposons une nouvelle approche et de nouvelles techniques dynamiques permettant de développer des compilateurs performants pour les langages dynamiques avec de relativement bonnes performances et un faible effort de développement.
Nous présentons une approche simple de compilation à la volée qui permet d’implémenter un langage en une seule phase de compilation, sans transformation vers des représentations intermédiaires.
Nous expliquons comment le versionnement de blocs de base, une technique de compilation existante, peut être étendue, sans effort de développement significatif, pour fonctionner interprocéduralement avec les langages de programmation d’ordre supérieur, permettant d’appliquer des optimisations interprocédurales sur ces langages.
Nous expliquons également comment le versionnement de blocs de base permet de supprimer certaines opérations utilisées pour implémenter les langages dynamiques et qui impactent les performances comme les vérifications de type.
Nous expliquons aussi comment les compilateurs peuvent exploiter les représentations dynamiques des valeurs par Tagging et NaN-boxing pour optimiser le code généré avec peu d’effort de développement.
Nous présentons également notre expérience de développement d’un compilateur à la volée pour le langage de programmation Scheme, pour montrer que ces techniques permettent effectivement de construire un compilateur avec un effort moins important que les compilateurs actuels et qu’elles permettent de générer du code efficace, qui rivalise avec les meilleures implémentations du langage Scheme.Efficiently implementing dynamic programming languages requires a significant development
effort. Over the years, compilers have become more complex. Today, they typically include
an interpretation phase, several compilation phases, several intermediate representations and
code analyses. These techniques allow efficiently implementing these programming languages
but are difficult to implement in contexts in which development resources are limited. We
propose a new approach and new techniques to build optimizing just-in-time compilers for
dynamic languages with relatively good performance and low development effort.
We present a simple just-in-time compilation approach to implement a language with
a single compilation phase, without the need to use code transformations to intermediate
representations. We explain how basic block versioning, an existing compilation technique,
can be extended without significant development effort, to work interprocedurally with higherorder
programming languages allowing interprocedural optimizations on these languages. We
also explain how basic block versioning allows removing operations used to implement dynamic
languages that degrade performance, such as type checks, and how compilers can use Tagging
and NaN-boxing to optimize the generated code with low development effort. We present our
experience of building a JIT compiler using these techniques for the Scheme programming
language to show that they indeed allow building compilers with less development effort
than other implementations and that they allow generating efficient code that competes with
current mature implementations of the Scheme language
Performance Evaluation of Optimal Ate Pairing on Low-Cost Single Microprocessor Platform
The framework of low-cost interconnected devices forms a new kind of cryptographic environment with diverse requirements. Due to the minimal resource capacity of the devices, light-weight cryptographic algorithms are favored.
Many applications of IoT work autonomously and process sensible data, which emphasizes security needs, and might also cause a need for specific security measures.
A bilinear pairing is a mapping based on groups formed by elliptic curves over extension fields. The pairings are the key-enabler for versatile cryptosystems, such as certificateless signatures and searchable encryption. However, they have a major computational overhead, which coincides with the requirements of the low-cost devices. Nonetheless, the bilinear pairings are the only known approach for many cryptographic protocols so their feasibility should certainly be studied, as they might turn out to be necessary for some future IoT solutions. Promising results already exist for high-frequency CPU:s and platforms with hardware extensions.
In this work, we study the feasibility of computing the optimal ate pairing over the BN254 curve, on a 64 MHz Cortex-M33 based platform by utilizing an optimized open-source library. The project is carried out for the company Nordic Semiconductor. As a result, the pairing was effectively computed in under 26* 10^6 cycles, or in 410 ms.
The resulting pairing enables a limited usage of pairing-based cryptography, with a capacity of at most few cryptographic operations, such as ID-based key verifications per second. Referring to other relevant works, a competent pairing application would require either a high-frequency - and thus high consuming - microprocessor, or a customized FPGA. Moreover, it is noted that the research in efficient pairing-based cryptography is constantly taking steps forward in every front-line: efficient algorithms, protocols, and hardware-solutions
The Mordell-Weil sieve: Proving non-existence of rational points on curves
We discuss the Mordell-Weil sieve as a general technique for proving results
concerning rational points on a given curve. In the special case of curves of
genus 2, we describe quite explicitly how the relevant local information can be
obtained if one does not want to restrict to mod p information at primes of
good reduction. We describe our implementation of the Mordell-Weil sieve
algorithm and discuss its efficiency.Comment: 47 pages, 4 figures. Revised version, following suggestions/questions
by the referee. Main changes: (1) We fixed a gap in the proof of Lemma 4.3
and added a discussion of the case r < g-1. (2) We added a heuristic
discussion of the success probability at a single prime in Section
- …