Abstract. Hardware implementation aspects of the finite automaton public key cryptosystem are discussed in this paper. A general architecture of the multiplication of a square matrix on a vector over GF(q) is presented in the paper. Our design was implemented on Altera EP3C5E144C8N of the Cyclone III FPGA family. The performance of finite automaton public key cryptosystems is mainly appointed by the efficiency of the underlying finite field arithmetic. The results are compared with reported reconfigurable hardware implementations of RSA. Proposed hardware realization of cryptographic system allows organizing pipeline calculations.
Introduction
Today there are many distributed systems, which use communication resources that must be safeguarded against eavesdropping or unauthorized data alteration. Thus cryptographic protocols are applied to these systems in order to prevent information extraction or to detect data manipulation by unauthorized users. Besides the widelyused RSA method [1] , other public-key schemes have gained more and more importance in this context.
The public key cryptosystem on the basis of finite state machines (FSM) has been offered by Tao Renji [2] and was named FAPKC (Finite Automaton Public Key Cryptosystem). The private key consists of two FSM that are constructed so that their inverses are easily calculated. The public key is the automaton which we get by combining these two automata. It is believed to be hard to invert this combined automaton without knowledge of the private key automata [1] .
FAPKC cryptosystem is a stream ciphe. There are some modifications: FAPKC0 [2] , FAPKC1 and FAPKC2 [3] , FAPKC3 [4, 5] and FAPKC4 [6] . The cryptosystem can be used as for enciphering (with using public key) and for decryption (with using FSM, which is reversed to public key), and for signing (by this FSM) and for check of the signature (by public key) [6, 7] .
Cryptographic transformations can be implemented in both software and hardware [8] . Software implementations are designed and coded in programming languages, such as C, C++, Java, and assembly language, to be executed, among others, on general purpose microprocessors, digital signal processors, and smart cards. Hardware implementations are designed and coded in hardware description languages, such as VHDL and Verilog HDL, and are intended to be realized using two major implementation approaches: application-specific integrated circuits (ASICs) and field programmable gate arrays (FPGAs).
In this paper we describe an implementation method of FAPKC. The method is based on realization of computational structure in GF(q). A project was implemented on the programmed matrix FPGA EP3C5E144C8N of the device family Cyclone III offered by Altera.
FAPKC Principles

Definitions and Denotations
Let us introduce the basic terminology [5] : 
where s ∈ S, x ∈ X, α ∈ X N and α′ ∈ X N ∪ X ω . Definition 2. Let M = <X,Y,S,K,δ,λ> be a finite automaton, X and Y are column vector spaces over GF(q) of dimension l and m, respectively, and τ be a nonnegative integer. M is a weakly invertible with delay τ if, for any x i ∈ X, i =0, 1, …, τ and s ∈ S, x 0 can be uniquely determined by the state s and the output λ(s, x 0 …x τ ).
For any states s ∈ S and s′ ∈ S, if ∀α ∈ X ω , ∃α 0 ∈ X N :
then (s′, s) is a matching pair with delay τ or s′ matches s with delay τ .
Definition 3.
Let M = <X, Y, S, δ ,λ> and M′ = <Y, X, S′, δ′, λ′> be two finite automata and τ be a nonnegative integer. M′ is a weak inverse with delay τ of M if, for any s ∈ S, there exists s′ in S′ that (s′, s) is a matching pair with delay τ.
for any x ∈ X 1 and
If the mapping ϕ is from X h+1 to Y then M ϕ is an h-order input memory finite automaton.
Brief Overview of FAPKC Cryptosystems
In the paper [5] it was shown how to break the FAPKC cryptosystem if not a suitable nonlinear automaton is used. From the mathematical point of view of rules and conditions finding of invertible nonlinear automata of the suitable form and superposition of its are in detail considered in [5, 9] . Therefore here we will consider only the cryptographic algorithm steps for encryption/decryption without mathematical calculations.
Let X and Y be column vector spaces over GF(q), with dimension l. A FAPKC3 user A chooses his own public key and secret key as follows: (1 
Define parts needed in decryption Encryption. When user B wants to send to user A a plaintext x 0 … x n in secret, he first adds some τ 0 + τ 1 letters x n+1 … x n+τ0+τ1 to the end of the plaintext. Then he can compute the ciphertext using A's public key:
Decryption. After receiving the ciphertext … , user A first computes using parts * and , of his secret key and part , … , of his public key
Then using parts * and , of his secret key user A retrieves the plaintext
As present in [5] automaton M 0 is in the form of ( ) = ∑ ( − ) + ∑ ( − ), = 0,1, …
and * in the form of ( ) = ∑ * ( − ) + ∑ * ( − ), = 0,1, … (14)
The nonlinear automaton M 1 is in the form of 
and
Parameters h 0 , h 1 , k 0 , τ 0 and τ 1 may be chosen by users.
FPGA Implementation of FAPKC
Hardware Implementation Environment
In this paper FAPKC3 was implemented on the programmed matrix Altera FPGA of the device family Cyclone III (EP3C5E144C8N). This Cyclone III device has a total of 5,136 logic elements, 23 multipliers, 2 phase-locked loops (PLLs) [10] . Quartus II Web Edition software includes an integrated development environment to accelerate system-level design and seamless integration with leading third-party software tools and flows [11] .
Implementations Details
The equations show that multiplication of a square matrix on a vector is the basic operation in algorithm FAPKC3. Matrix Аr[n, n], a vector V[n], dimension of a matrix are input parameters for the given procedure. P [n] is a resultant vector. Each element of a resultant vector calculated by search of elements of lines of matrix Аr and their multiplication to corresponding elements of a vector V. The result of multiplication increases to current value of an element of vector P. Each element of a resultant vector is calculated by the following formula:
The information graph of calculation realization according to the formula (19) is presented in Figure 1 . 
where the top index means a vector's element, the bottom index means number of vector in the stream. Thus, elements streams of matrix and vectors from information tops Аr and V accordingly are put on multiplier inputs. Further, the result of multiplication is added with earlier keeped value of an element of vector P from the delay line.
The structure of block P, which is used for multiplication of a matrix on vectors stream, is presented on Figure 2 . CR blocks provide communication of the computing scheme with memory cells from which commands and the data are read out and written. CR1 is designed for storage and getting of commands and lines elements of a matrix, and also for writing of calculation results from CalB. CR2 it is designed for storage and getting of vectors stream. -elements of a matrix line for their preservation in the internal buffer (loading of elements of a matrix), -or elements of stream vectors for computing structure (performance of calculations). Commands to CB are arrive from CR1. CB is designed for decoding of commands and formation of control signals for management of calculations by CalB. The structure of CalB block is presented on Figure 3 .
Buffer BUF1 is used for storage of a matrix line. Multiplier MUL, adder ADD and a delay line BUF2 realize calculations according to an information graph as shown on 
FPGA Implementations Results
The proposed architecture was captured by using VHDL. All the system components were described with structural architecture. The system tested using confirmed test vectors [5] in order to examine its correctness. The whole design was synthesized, placed and routed by using Altera FPGA device (Cyclone III, EP3C5E144C8N) [10] . Synthesis results for the proposed implementation are obtained as follows: 4,314 logic elements used (84% from available area) at a maximum frequency of 45 MHz and data rate of 24,3 Kb/s.
The maximum operating frequency F as well as the number of logic elements required for our implementation was obtained from the Quartus II Web Edition software. Comparison between the proposed FAPKC implementation and a 512-bit RSA implementations are presented in Table 1 . Area resource comparisons are not given because they are not provided by the other implementations. [12] 45.6 140-460 RSA in [13] 100 100 RSA in [14] 50 43 Proposed 45 MHz 24,3
Concluding Remarks
FAPKC implementation oriented on multiplication of a square matrix on a vector over GF(q) on the reprogrammable matrix is offered in this paper. FAPKC3 was implemented on Altera FPGA of the device family Cyclone III (EP3C5E144C8N).
The proposed system achieves a data throughput up to 24,3 Kb/s in a clock frequency of 45 MHz. The most part of an integrated circuit area (84%) was used in our FPGA implementation of FAPKC3 with small parameters (q = 2, l = 3, h 0 = 1, k 0 = 2 and τ 0 = 1).
Thus, a future work will to concern of the implementation of FAPKC with the large parameters. The most obvious extension is to design a fully pipelined architecture in order to obtain a higher throughput at the price of area.
