    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Comparison of Scalable Montgomery Modular Multiplication Implementations Embedded in Reconfigurable Hardware

    International audienceThis paper presents a comparison of possible approaches for an efficient implementation of Multiple-word radix-2 Montgomery Modular Multiplication (MM) on modern Field Programmable Gate Arrays (FPGAs). The hardware implementation of MM coprocessor is fully scalable what means that it can be reused in order to generate long-precision results independently on the word length of the originally proposed coprocessor. The first of analyzed implementations uses a data path based on traditionally used redundant carry-save adders, the second one exploits, in scalable designs not yet applied, standard carry-propagate adders with fast carry chain logic. As a control unit and a platform for purely software implementation an embedded soft-core processor Altera NIOS is employed. All implementations use large embedded memory blocks available in recent FPGAs. Speed and logic requirements comparisons are performed on the optimized software and combined hardware-software designs in Altera FPGAs. The issues of targeting a design specifically for a FPGA are considered taking into account the underlying architecture imposed by the target FPGA technology. It is shown that the coprocessors based on carry-save adders and carry-propagate adders provide comparable results in constrained FPGA implementations but in case of carry-propagate logic, the solution requires less embedded memory and provides some additional implementation advantages presented in the paper

    Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation

    This paper describes a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The multiplexed implementation is a new alternative and is composed of multiplier blocks in parallel with the new simplified Processing Elements, and it provides a pipelined operation mode. We compare the time × area efficiency for both architectures as well as an RSA application. The systolic implementation can run the 1024 bits RSA decryption process in just 3.23 ms, and the multiplexed architecture executes the same operation in 4.36 ms, but the second approach saves up to 28% of logical resources. These results are competitive with the state-of-the-art performance

    Показатели уровня механизации производства в отраслях агропромышленного комплекса

    В статье дана краткая характеристика этапов замены ручного труда машинным в отраслях агропромышленного комплекса, приведены соотношения для расчета коэффициентов механизации работ и труда. The article gives a brief description of the stages of replacing manual labor with machine labor in the branches of the agro-industrial complex, the ratios for calculating the coefficients of mechanization of work and labor are given

    Pelabelan Klaster Artikel Ilmiah Menggunakan Topic Rank dan Maximum Common Subgraph

    Metode klasterisasi dapat memudahkan pengelompokkan artikel ilmiah. Pelabelan klaster diperlukan untuk mengetahui frasa kunci yang merepresentasikan topik bahasan kelompok artikel ilmiah. Beberapa klaster artikel ilmiah perlu digabung karena masih memiliki kemiripan topik untuk memberikan hasil label klaster yang lebih baik. Kemiripan topik dapat diwakili dengan kesamaan relasi kata yang dimodelkan dengan graf. Penelitian ini memiliki usulan metode pelabelan klaster artikel ilmiah dengan proses penggabungan klaster berdasarkan kesamaan struktur graf representasi klaster. Usulan metode terdiri dari : (1) Pengelompokkan artikel ilmiah menggunakan metode klasterisasi K-Means++. (2) Ekstraksi kandidat frasa menggunakan Frequent Phrase Mining (FPM). (3) Konstruksi graf menggunakan kata – kata pembentuk frasa sebagai vertex dan relasi kata sebagai edge berdasarkan Word2Vec. (4) Penggabungan klaster dengan pengukuran similaritas klaster berdasarkan struktur Maximum Common Subgraph (MCS). (5) Pelabelan klaster pada hasil penggabungan klaster menggunakan metode TopicRank. Usulan metode dievaluasi pada 2 dataset artikel ilmiah yang memiliki variasi tingkat pemisahan dan kohesi klaster. Koherensi topik digunakan sebagai pengukuran evaluasi untuk mengukur tingkat keterkaitan topik label klaster pada sebuah klaster. Hasil pengujian menunjukkan bahwa dataset yang memiliki tingkat pemisahan dan kohesi klaster yang tinggi (homogen) menghasilkan koherensi topik label klaster gabungan yang lebih tinggi. Penggunaan relasi kata co-occurrence pada pembuatan graf representasi klaster menghasilkan koherensi topik yang lebih baik dibandingkan relasi kata Word2Vec. Hal ini disebabkan oleh relasi kata co-occurrence berbasis frekuensi sehingga merepresentasikan topik mayoritas klaster. ========================================================================================================== Unstructured scientific articles can benefited by clustering method to group scientific articles based on topic similarity. Cluster labeling on the yielded cluster is required to discover key phrases that best represent the topics covered. Several clusters still need to be bundled because they still have similar topics to give better cluster labels results. In addition to word occurences, the similarity of the topic can also be represented by word semantic relation that can be modeled with the graph. This research proposes labeling clusters of scientific articles with cluster merging as research contribution to provide a more representative label of cluster topics. This research proposed cluster labeling method with cluster merging process using graph model. Graph model approach is choosen because it can map the relationship between words, hence representing text semantic information. There are several stages in the proposed method. First, K-Means++ clustering method is applied on a collection of scientific articles. Second, for each cluster, phrase extraction is executed using Frequent Phrase Mining to get word tokens that capable to constitute representative phrase for cluster topics. Acquired word tokens used as input to constructing graph representation of a cluster. After that, cluster merging is done based on cluster graph similarity using Maximum Common Subgraph (MCS) method. Then, the cluster labeling process is performed on clusters that have been merged using the TopicRank method. Proposed method evaluated on 2 dataset based on the merged cluster label topic coherence score, using Word2Vec-based graph model and co-occurence-based graph model. Result show that homogenous dataset 1 yield better result than heterogenous dataset 2. In addition, the use of co-occurence-based graph produce prefereable result on cluster merging process

    Serial-serial finite field multiplication

    Hardware Implementation of Parallel Modular Exponentiation Algorithm Based on Pipelining Technique

    针对r-l模幂算法并行硬件实现成本高的问题,提出一种流水线形式的模幂运算结构。采用流水线技术对模幂算法中MOnTgOMEry模乘运算进行硬件设计,并由此构建模幂运算结构,实现并行模幂运算,降低硬件成本。同时对模幂算法中预处理和后处理步骤进行优化,以减少迭代次数。VIrTEX-2系列现场可编程门阵列原型的实现结果表明,在保证并行模幂运算速度的前提下,该结构的硬件实现成本近似为传统并行结构的1/2,且数据吞吐率更高,可达14 Mb/S。An efficient pipelined architecture is presented in this paper for solving the problem of high hardware cost of R-L modular exponentiation algorithm,which is formed of Montgomery modular multiplication built by using pipelining technique.The parallel calculation of algorithm can be executed and the hardware cost can be also reduced in the new architecture.Besides,two extra pre-processing and post-processing for converting an integer to its N-residue format in the conventional modular exponentiation algorithm are avoided to reduce the iteration time.The result shows that the new architecture can achieve high data throughput rate of more than 14 Mb/s on Xilinx Field Programmable Gata Array(FPGA) of Virtex-2 series when performs modular exponentiation,while occupy only about half hardware resources when compared with the conventional parallel architecture

    Cryptarray A Scalable And Reconfigurable Architecture For Cryptographic Applications

    Cryptography is increasingly viewed as a critical technology to fulfill the requirements of security and authentication for information exchange between Internet applications. However, software implementations of cryptographic applications are unable to support the quality of service from a bandwidth perspective required by most Internet applications. As a result, various hardware implementations, from Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), to programmable processors, were proposed to improve this inadequate quality of service. Although these implementations provide performances that are considered better than those produced by software implementations, they still fall short of addressing the bandwidth requirements of most cryptographic applications in the context of the Internet for two major reasons: (i) The majority of these architectures sacrifice flexibility for performance in order to reach the performance level needed for cryptographic applications. This lack of flexibility can be detrimental considering that cryptographic standards and algorithms are still evolving. (ii) These architectures do not consider the consequences of technology scaling in general, and particularly interconnect related problems. As a result, this thesis proposes an architecture that attempts to address the requirements of cryptographic applications by overcoming the obstacles described in (i) and (ii). To this end, we propose a new reconfigurable, two-dimensional, scalable architecture, called CRYPTARRAY, in which bus-based communication is replaced by distributed shared memory communication. At the physical level, the length of the wires will be kept to a minimum. CRYPTARRAY is organized as a chessboard in which the dark and light squares represent Processing Elements (PE) and memory blocks respectively. The granularity and resource composition of the PEs is specifically designed to support the computing operations encountered in cryptographic algorithms in general, and symmetric algorithms in particular. Communication can occur only between neighboring PEs through locally shared memory blocks. Because of the chessboard layout, the architecture can be reconfigured to allow computation to proceed as a pipelined wave in any direction. This organization offers a high computational density in terms of datapath resources and a large number of distributed storage resources that easily support a high degree of parallelism and pipelining. Experimental prototyping a small array on FPGA chips shows that this architecture can run at 80.9 MHz producing 26,968,716 outputs every second in static reconfiguration mode and 20,226,537 outputs every second in dynamic reconfiguration mode