1,303 research outputs found

    Efficient long division via Montgomery multiply

    Full text link
    We present a novel right-to-left long division algorithm based on the Montgomery modular multiply, consisting of separate highly efficient loops with simply carry structure for computing first the remainder (x mod q) and then the quotient floor(x/q). These loops are ideally suited for the case where x occupies many more machine words than the divide modulus q, and are strictly linear time in the "bitsize ratio" lg(x)/lg(q). For the paradigmatic performance test of multiword dividend and single 64-bit-word divisor, exploitation of the inherent data-parallelism of the algorithm effectively mitigates the long latency of hardware integer MUL operations, as a result of which we are able to achieve respective costs for remainder-only and full-DIV (remainder and quotient) of 6 and 12.5 cycles per dividend word on the Intel Core 2 implementation of the x86_64 architecture, in single-threaded execution mode. We further describe a simple "bit-doubling modular inversion" scheme, which allows the entire iterative computation of the mod-inverse required by the Montgomery multiply at arbitrarily large precision to be performed with cost less than that of a single Newtonian iteration performed at the full precision of the final result. We also show how the Montgomery-multiply-based powering can be efficiently used in Mersenne and Fermat-number trial factorization via direct computation of a modular inverse power of 2, without any need for explicit radix-mod scalings.Comment: 23 pages; 8 tables v2: Tweak formatting, pagecount -= 2. v3: Fix incorrect powers of R in formulae [7] and [11] v4: Add Eldridge & Walter ref. v5: Clarify relation between Algos A/A',D and Hensel-div; clarify true-quotient mechanics; Add Haswell timings, refs to Agner Fog timings pdf and GMP asm-timings ref-page. v6: Remove stray +bw in MULL line of Algo D listing; add note re byte-LUT for qinv_

    Engineering Planar-Separator and Shortest-Path Algorithms

    Get PDF
    "Algorithm engineering" denotes the process of designing, implementing, testing, analyzing, and refining computational proceedings to improve their performance. We consider three graph problems -- planar separation, single-pair shortest-path routing, and multimodal shortest-path routing -- and conduct a systematic study in order to: classify different kinds of input; draw concrete recommendations for choosing the parameters involved; and identify and tune crucial parts of the algorithm

    Hardware Implementation of Barrett Reduction Exploiting Constant Multiplication

    Get PDF
    The efficient realization of an Elliptic Curve Cryptosystem is contingent on the efficiency of scalar multiplication. These systems can be improved by optimizing the underlying finite field arithmetic operations which are the most costly such as modular reduction. There are elliptic curves over prime fields for which very efficient reduction formulas are possible due to the special structure of the moduli. For prime moduli of arbitrary form, however, use of general reduction formulas, such as Barrett's reduction algorithm, are necessary. Barrett's algorithm performs modular reduction efficiently by using multiplication as opposed to division, an operation which is generally expensive to realize in hardware. We note, however, that when an Elliptic Curve Cryptosystem is defined over a fixed prime field, all multiplication steps in Barrett's scheme can be realized through constant multiplications; this allows for further optimization. In this thesis, we study the influence using constant multipliers has on four different Barrett reduction variants targeting the Virtex-7 (xc7vx485tffg1157-1). We use the FloPoCo core generator to construct constant multiplier implementations for the different multiplication steps required in each scheme. Then, we create a hybrid constant multiplier circuit based on Karatsuba multiplication which uses smaller FloPoCo-generated base multipliers. It is shown that for certain multiplication steps, the hybrid design provides an improvement in the resource utilization of the constant multiplier circuit at the cost of an increase in the critical path delay. A performance comparison of different Barrett reduction circuits using different combinations of constant multiplier architectures is presented. Additionally, a fully pipelined implementation of each Barrett reduction variant is also designed capable of achieving operational frequencies in the range of 496-504MHz depending on the Barrett scheme considered. With the addition of a 256-bit pipelined Karatsuba multiplier circuit, we also present a compact and fully pipelined modular multiplier based on these Barrett architectures capable of achieving very high throughput compared to others in the literature without the use of embedded multipliers

    Vers une arithmétique efficace pour le chiffrement homomorphe basé sur le Ring-LWE

    Get PDF
    Fully homomorphic encryption is a kind of encryption offering the ability to manipulate encrypted data directly through their ciphertexts. In this way it is possible to process sensitive data without having to decrypt them beforehand, ensuring therefore the datas' confidentiality. At the numeric and cloud computing era this kind of encryption has the potential to considerably enhance privacy protection. However, because of its recent discovery by Gentry in 2009, we do not have enough hindsight about it yet. Therefore several uncertainties remain, in particular concerning its security and efficiency in practice, and should be clarified before an eventual widespread use. This thesis deals with this issue and focus on performance enhancement of this kind of encryption in practice. In this perspective we have been interested in the optimization of the arithmetic used by these schemes, either the arithmetic underlying the Ring Learning With Errors problem on which the security of these schemes is based on, or the arithmetic specific to the computations required by the procedures of some of these schemes. We have also considered the optimization of the computations required by some specific applications of homomorphic encryption, and in particular for the classification of private data, and we propose methods and innovative technics in order to perform these computations efficiently. We illustrate the efficiency of our different methods through different software implementations and comparisons to the related art.Le chiffrement totalement homomorphe est un type de chiffrement qui permet de manipuler directement des données chiffrées. De cette manière, il est possible de traiter des données sensibles sans avoir à les déchiffrer au préalable, permettant ainsi de préserver la confidentialité des données traitées. À l'époque du numérique à outrance et du "cloud computing" ce genre de chiffrement a le potentiel pour impacter considérablement la protection de la vie privée. Cependant, du fait de sa découverte récente par Gentry en 2009, nous manquons encore de recul à son propos. C'est pourquoi de nombreuses incertitudes demeurent, notamment concernant sa sécurité et son efficacité en pratique, et devront être éclaircies avant une éventuelle utilisation à large échelle.Cette thèse s'inscrit dans cette problématique et se concentre sur l'amélioration des performances de ce genre de chiffrement en pratique. Pour cela nous nous sommes intéressés à l'optimisation de l'arithmétique utilisée par ces schémas, qu'elle soit sous-jacente au problème du "Ring-Learning With Errors" sur lequel la sécurité des schémas considérés est basée, ou bien spécifique aux procédures de calculs requises par certains de ces schémas. Nous considérons également l'optimisation des calculs nécessaires à certaines applications possibles du chiffrement homomorphe, et en particulier la classification de données privées, de sorte à proposer des techniques de calculs innovantes ainsi que des méthodes pour effectuer ces calculs de manière efficace. L'efficacité de nos différentes méthodes est illustrée à travers des implémentations logicielles et des comparaisons aux techniques de l'état de l'art

    Study Of Candidate Genes For Dyslexia In Brazilian Individuals.

    Get PDF
    Dyslexia or reading disability (RD) is the most common childhood learning disorder and a significantly heritable trait. Many recent studies have investigated the genetic basis of dyslexia, and several candidate genes have been proposed. Among these, DCDC2 and KIAA0319 have emerged as the strongest candidate genes for dyslexia; however studies have not provided uniformly supportive results. The aim of this study was to assess the contribution of proposed candidate genes to the molecular etiology of dyslexia in a Brazilian sample. Large deletions and duplications in the candidate genes DCDC2, KIAA0319, and ROBO1 were investigated in 51 dyslexic subjects. Furthermore, a family-based association study was performed to investigate whether associations observed in other populations with variants in the DCDC2 and KIAA0319 genes were reproducible in Brazilian dyslexic individuals. Our analysis did not detect any deletions or duplications in the genes studied, and we found no evidence that the allelic variants in the two candidate genes were significantly associated with RD in our sample. Our data do not support a role of the DCDC2/KIAA0319 locus in influencing dyslexia as a categorical trait. Given the genetic complexity of dyslexia, it is plausible that both genes contribute to an increased risk, but the relative influence of these 2 genes on RD varies in different study samples, and/or depends on analytical approaches.125356-6

    Robust Dropping Criteria for F-norm Minimization Based Sparse Approximate Inverse Preconditioning

    Full text link
    Dropping tolerance criteria play a central role in Sparse Approximate Inverse preconditioning. Such criteria have received, however, little attention and have been treated heuristically in the following manner: If the size of an entry is below some empirically small positive quantity, then it is set to zero. The meaning of "small" is vague and has not been considered rigorously. It has not been clear how dropping tolerances affect the quality and effectiveness of a preconditioner MM. In this paper, we focus on the adaptive Power Sparse Approximate Inverse algorithm and establish a mathematical theory on robust selection criteria for dropping tolerances. Using the theory, we derive an adaptive dropping criterion that is used to drop entries of small magnitude dynamically during the setup process of MM. The proposed criterion enables us to make MM both as sparse as possible as well as to be of comparable quality to the potentially denser matrix which is obtained without dropping. As a byproduct, the theory applies to static F-norm minimization based preconditioning procedures, and a similar dropping criterion is given that can be used to sparsify a matrix after it has been computed by a static sparse approximate inverse procedure. In contrast to the adaptive procedure, dropping in the static procedure does not reduce the setup time of the matrix but makes the application of the sparser MM for Krylov iterations cheaper. Numerical experiments reported confirm the theory and illustrate the robustness and effectiveness of the dropping criteria.Comment: 27 pages, 2 figure
    • …
    corecore