Abstract: Chaotic filter bank schemes have been proposed in the research literature to allow for the efficient encryption of data for real-time embedded systems. Some security flaws have been found in the underlying approaches which makes such a scheme unsafe for application in real life scenarios. In this paper, we first present an improved scheme to alleviate the weaknesses of the chaotic filter bank scheme, and add enhanced security features, to form a modified chaotic filter bank (MCFB) scheme. Next, we present a reconfigurable hardware implementation of the MCFB scheme. Implementation on reconfigurable hardware speeds up the performance of MCFB scheme by mapping some of the multipliers in design to reconfigurable look-up tables, while removing many unnecessary multipliers. An optimised implementation on Xilinx Virtex-5 XC5VLX330 FPGA gave a speedup of 30% over non-optimised direct implementation. A clock frequency of 88 MHz was obtained.
Introduction

Chaos and cryptography
Chaos theory plays an active role in modern cryptography. As the basis for developing a cryptosystem, the advantage of using chaos lies in its random behaviour and sensitivity to initial conditions and parameter settings to fulfil the classical Shannon requirements of confusion and diffusion (Shannon, 1949) . A tiny difference in the starting state and parameter setting of these systems can lead to completely different outputs over a few iterations. Thus, sensitivity to initial conditions manifests itself as an exponential growth of error and the behaviour of system appears chaotic.
Quite a bit of research has been devoted to the study of continuous-time chaotic systems such as the oscillator circuits (Carroll and Pecora, 1991; Liang et al., 2008; Robilliard et al., 2006) . However, these schemes need a synchronisation procedure. On the other hand, discrete-time chaotic systems behave like private-key encryption algorithms (Rueppel, 1986) and are amenable to implementation on fixed point hardware.
Many chaotic block ciphers (Baptista, 1998; Kocarev et al., 1998; Guanrong Chen and Chui, 2004; Yaobin Mao and Lian, 2004; Pichler and Scharinger, 1996) have been proposed in research literature. For example, Baptista (1998) builds a block cipher based on chaotic encryption. Each character of the message is encoded as the integer number of iterations performed in the logistic equation, in order to transfer the trajectory from an initial condition towards a pre-defined interval inside the logistic chaotic attractor.
Some limitations of such block ciphers and the logistic chaotic attractor are explained as follows:
Firstly, the distribution of the ciphertext is not flat enough to ensure high security since the occurrence probability of cipher blocks decays exponentially as the number of iterations increases. Secondly, the encryption speed of these cryptographic schemes is very slow since at least 250 iterations of the chaotic map are required for encrypting an 8-bit symbol. The number of iterations may vary up to 65,532. Thirdly, the length of ciphertext is at least twice that of plaintext, X bits of message may result in several tens of thousands of iterations that need 2X bytes to carry. Despite the improvements proposed by subsequent research, block ciphers based on Baptista's (1998) work remain slow to satisfy the encryption needs of the real-time data encryption systems.
A stream cipher was designed over chaotic maps and presented in early 1991 by Habutsu et al. (1991) . Its cryptanalysis was presented in the same conference (Biham, 1991) . Guanrong Chen and Chui (2004) , and Yaobin Mao and Lian (2004) constructed a block cipher based on three-dimensional maps while Pichler and Scharinger (1996) proposed a cipher by direct discretisation of two dimensional Baker map. A good survey and introductory tutorial on these schemes is found in Yang (2004) and Kocarev (2001) . The authors in Masuda and Aihara (2002) present a crypto-system based on a discretisation of the skew tent map. Masuda et al. (2006) presents chaotic Feistel and chaotic uniform operations for block ciphers. Although various schemes/maps have been proposed in the research literature, the logistic map remains one of the simplest maps and is used in many schemes.
Wavelets and chaotic filter banks
Chaotic filter banks based cipher was proposed by Ling et al. (2007) . It allows great flexibility in the design and gives the following advantages:
1 One can embed signals in different frequency bands by employing different chaotic functions.
2 The number of chaotic generators to be employed and their corresponding functions can be selected and designed in a flexible manner because perfect reconstruction does not depend on the invertibility, causality, linearity and time invariance of the corresponding chaotic functions.
3 The ratios of the subband signal powers to the chaotic subband signal powers can be easily changed by the designers and perfect reconstruction is still guaranteed no matter how small these ratios are.
4 The proposed cryptographic system can be easily adapted to the international multimedia standards, such as JPEG 2000 and MPEG-4 (Ling et al., 2007) .
The encryption procedure is carried out by decomposing the input plaintext signal into two different subbands and masking each of them with a pseudorandom number sequence generated by iterating the chaotic logistic map. The authors (Ling et al., 2007) use the discrete wavelet transform (DWT) based filters banks in their approach to maintain compatibility with existing image compression standards such as JPEG 2000 (Christopoulos et al., 2000) . Arroyo et al. (2009) presents a cryptanalysis of Ling et al. (2007) which exposes weaknesses of chaotic filter bank against known plaintext attacks and also exposes the limitation of reduction of key space by use of logistic map.
Scope and organisation of this paper
In this paper, we present the design and implementation of a chaotic stream cipher that uses less hardware, has promising security and has high throughput to serve the requirements of real-time embedded systems. The main contributions of this paper can be summarised as follows:
1 The proposed modified chaotic filter bank (MCFB) scheme is a lightweight cipher designed to satisfy the resource requirements of real-time embedded systems, security requirements of modern communication systems and format-compliance with existing multimedia compression standards such as JPEG 2000, MPEG-4, etc.
2 To the best of knowledge of the authors, this is the first hardware implementation of a chaotic filter bank scheme in hardware.
3 A clock frequency of 88 MHz was obtained for a Virtex-5 XC5VLX330 FPGA. The design was synthesised and implemented using Xilinx ISE 10.1 tool.
The paper is organised as follows: Section 2 gives a brief overview of the wavelet transform. Section 3 gives details of the chaotic filter bank scheme proposed earlier.
In Section 4, we discuss the MCFB scheme and subsequently discuss its distinguishing features in Section 5 and Section 6. Section 5 explains the improved chaotic oscillator (ICO) and Section 6 gives an overview of wavelet parameterisation. Section 8 gives the details of hardware implementation over Xilinx Virtex-5 FPGA and the proposed optimisations, while Section 9 concludes the paper with directions of future work.
Wavelets
The efficient representation of time-frequency information by the wavelet transform has led to its popularity for signal processing applications. Figure 1 . The coarse information is preserved in the LL3 image and this operation forms the basis of multi-resolution analysis for DWT (Vetterli and Kovačevic, 1995) . Prior works in signal processing explain that the 1D DWT can be viewed as a signal decomposition using specific low pass and high pass filters. A single stage of image decomposition can be implemented by successive horizontal row and vertical column wavelet transforms. Thus, one level of DWT operation is represented by filtering with high and low pass filters across row and column successively and is explained in Figure 2 . After each filtering a down sampling is done by a factor of 2 to remove the redundant information.
Commonly used DWT filters
The two most common DWT filters used in image compression are Gall's 5/3 filter and the Daubechies 9/7 filter (Christopoulos et al., 2000) . They are accepted in the JPEG 2000 standards. The Gall's filter has rational coefficients and its hardware implementation requires less resources. The Daubechies 9/7 (also commonly known as CDF 9/7) filter has better compression performance. However, it has irrational coefficients therefore its hardware requirements are very large. 
Daubechies 9/7-tap bi-orthogonal filter
The biorthogonal Daubechies 9/7 filter is the most widely used filter for DWT operation. These wavelets have symmetric scaling and wavelet functions, i.e., both the low pass and high pass filters are symmetric. This filter has excellent image compression capabilities. There are four filters that comprise the two-channel biorthogonal wavelet system. The analysis and synthesis low-pass filters are denoted by H 0 and G 0 respectively. The analysis and synthesis high pass filters are denoted by H 1 and G 1 respectively and are obtained by quadrature mirroring the low-pass filters.
If we define D(z) = G 0 (z)H 0 (z) the perfect reconstruction (PR) condition simplifies to the following:
This equation is solved using Lagrange half band filters (LHBF), L K (z) where :
This is simplified for K = 4 to get the famous Cohen-Daubechies-Feauveau filter also known as Daubechies biorthogonal 9/7 filter. The filter coefficients are irrational. Gall and Tabatabai (1988) solved the PR condition by substituting
Gall's 5/3 filter
a = the simplification leads to the famous Gall's 5/3 filter pair. This filter has lower latency than the ones studied earlier but provides lesser image compression capabilities.
Reconfigurable hardware implementation
Much research has been done in the development of DWT architectures for image processing (Benkrid et al., 2001 (Benkrid et al., , 2003 Ritter and Molitor, 2001; Kotteri et al., 2005; Martina and Masera, 2007) . A good survey on architectures on DWT coding is given by Tseng et al. (2005) . Recent works in partial reconfiguration of FPGAs implement DWT in a reconfigurable fashion. Claus et al. (2008) gives a comparison of embedded reconfigurable video-processing architectures. They propose a hybrid of two hardware platforms: one providing easy reconfiguration of modules and the other providing easy implementation with higher clock frequency, to achieve an optimal FPGA-based dynamically and partially reconfigurable platform for real-time video and image processing. The tool ReCoBus-Builder (Koch et al., 2008) simplifies the generation of dynamically reconfigurable systems to almost a push button process. The work also describes a communication infrastructure for dynamically reconfigurable systems.
Chaotic filter bank scheme
The chaotic filter bank scheme is illustrated in Figure 3 
m m y n z n α z n y n z n α z n x n y mg n m y mg n m
where h 0 , h 1 are so-called analysis and g 0 , g 1 are synthesis filters. Choosing Gall's 5/3 filter or Daubechies 9/7 filters allow correct recovery of the plain text signal. 
Chaotic maps
As explained above, the chaotic filter bank scheme uses two chaotic maps α 0 () and α 1 () for its operation. These chaotic maps are based on the logistic map. The logistic map is a polynomial mapping of degree 2. It demonstrates chaotic behaviour although using a simple non-linear dynamical equation. Mathematically, the logistic map is written as:
where λ LM is a positive number. A rough description of chaos is that chaotic systems exhibit a great sensitivity to initial conditions -a property of the logistic map for most values of λ between about 3.57 and 4. This stretching-and-folding does not just produce a gradual divergence of the sequences of iterates, but an exponential divergence, evidenced also by the complexity and unpredictability of the chaotic logistic map. 
Key space
The authors in Ling et al. (2007) 2 Vulnerability to known plain-text attack. The value of λ LM can be calculated very accurately from two successive iterations of the logistic map leading to successful plain text attacks on the scheme.
The MCFB scheme
The MCFB scheme makes three modifications to the original scheme, making it more secure and also improving its frequency resolution.
1 The chaotic filter bank scheme (Ling et al., 2007) 3 We replace the DWT filter banks with a parameterised filter bank that yields has the same properties as the original filters but allows us to choose from a very large number of possible filters while implementing a filter bank.
The choice of filter bank and parameters for the chaotic oscillators used in the design is governed by a key. The overall system is shown in Figure 5 .
The ICO and parameterised wavelet transform are explained in following two sections. 
Improved chaotic oscillator
In this subsection, we give a brief description of an ICO, based on a MLM that alleviates the problems associated with chaotic generator proposed in Ling et al. (2007) . The proposed scheme is robust to the choice of initial conditions (due to lack of any unsuitable λ values), achieves real-time encryption speed and resistant to known attacks. 
The modified logistic map (MLM)
Our initial experimentation involved generation of pseudo-random number sequences by varying the parameter The output of the MLM (x n ) is quantised to get a 16 bit value p n . x n , 0 < x n < 1 is represented in fixed point as follows:
{ } 
∑
The quantisation step or truncation of more significant bits is non-linear in nature (it is a many-one mathematical function), thereby increasing the complexity of any attacks that try to recover the logistic map information from the cipher text using any cryptanalysis. We generate another pseudo-random sequence s n from the given sequence p n by the following operation:
There is no linear correlation between the two sequences p n and s n . Statistical de-correlation makes it difficult to back-track p n from s n .
Wavelet parameterisation
We now present a new layout and configuration scheme for the parameterised DWT. A new parameterised construction of the DWT filter with rational coefficients has dual advantages. The parameterised construction can be used to build a key scheme while the rational coefficients of the DWT enable an efficient hardware architecture using fixed point arithmetic (Pande and Zambreno, 2009 ). We get the following expression for H 1 (z) and H 2 (z). We get different DWT filters simply by changing the a values. The choice of the a value is secretly determined using a secret key. The numerical value of free parameter a can be varied over a wide range while retaining the perfect reconstruction property of the wavelet transform. However, as we vary the value of a over the range (−∞, +∞), the output values of the DWT operation have a very large dynamic range requiring a larger number of bits for representation. This would reduce the compression rates achievable with the DWT-based coders. Numerical experiments show that parameterised DWT has a good PSNR value for image reconstruction with set-partitioning in Hierarchical Trees (SPIHT) based coder when a varies in the range 1 to 3. When a varies beyond this range, the output DWT coefficients are spread over a large dynamic range. At low bit rates, the encoder is not able to efficiently encode such a large range of input coefficients leading to poor compression results for natural images.
Security enhancement
A serious drawback of chaotic crypto-systems is that they are weak against known-plaintext attacks. If the plain-text and the ciphertext are known, it is easy to XOR both the values and obtain the key value that was XORed to the original plaintext. Our proposed scheme has many advantages over logistic map:
• The MLM has better security properties than the logistic map. Figure 7 shows the sensitivity of MLM to the initial conditions. A slight difference in the initial condition leads to outputs which are completely uncorrelated. The bifurcation map for LM and MLM are shown in Figure 8 . The absence of any white space in the key space of MLM allows us to build a continuous key-space. Figure 9 shows the graph for Lyapunov exponent for MLM which is higher than LM. A positive and higher Lyapunov exponent indicates the rate of divergence of two closely related inputs for the system.
• The random feedback scheme makes it difficult to predict the key value XORed to the original plaintext.
• The sequences s n and p n are linearly uncorrelated from each other making it difficult to reverse engineer the values of p n from s n .
• The sequence p n is obtained by sampling of x n which is used to iterate the chaotic map. In the hardware implementation (presented in next section), we sample the Least Significant 16 bits (out of 64) of x n to get p n . Because, the chaotic map is more sensitive to the MSB than to the LSB (and we have 48 unknown MSB bits), it is practically impossible to trace back the x n value.
• We allowed 100 iterations of MLM in the beginning to allow the diffusion of initial key bits and parameter values. It was found that within approximately 20 iterations of logistic map the initial parameter values are fully diffused: the two logistic maps with a slight difference in initial conditions will appear completely de-correlated in their outputs after at most 20 iterations. Allowing 100 iterations help us to be on a safer side to allow full diffusion of the initial key parameters.
• Use of DWT parameterisation adds to the security of the scheme. The exact choice of DWT filter is given by a secret key. Lack of this knowledge will lead to inexact extraction of plain-text after decrypting the cipher-text. The ICO shows good results against runs test, serial test, correlation test etc. which are used to prove the randomness of output s[n] or s n .
8 Hardware implementation Figure 10 shows the hardware architecture for MCFB scheme. The input x[n] is first pipelined for eight cycles and then the parameterised DWT filter is applied over it. The nine pipelined stages are then reduced to five by adding the stages with similar wavelet coefficients together to get Figure 11 . Two instances of ICOs are required in the design.
Some optimisation steps performed to reduce the cost of the underlying hardware are summarised below:
1 Division by binary coefficients (e.g., 1/64, 1/16, 1/4) was performed using arithmetic shift operations.
2 The input stream was pipelined. As shown in Figure 10 , our architecture takes one pixel (or channel input) as the input and outputs the low and high pass signal coefficients with a finite latency. Increasing the system latency allows us to achieve a higher clock speed (and hence higher throughput). The hardware implementation of proposed architecture was done using the Xilinx ISE 10.1 tool. format. The range for parameter λ is then calculated to be (Robilliard et al., 2006; Masuda et al., 2006) which is implemented with 5.59 I.F format. The range for μ is (−3, −15.0975) which is represented using 5.59 I.F format.
Thus, the multiplication λ × x(i) × (1 − x(i)) is truncated to 5.59 I.F format and then added to μ to obtain the new value for x(i). A direct implementation gave a clock frequency of 67.8 MHz while requiring 48 DSP48E slices present in the Virtex-5 FPGA for efficient multiplication and addition operations. We present two optimisations to improve the clock frequency of the design while reducing the hardware requirements of the design.
Reconfigurable constant multiplier
The values a, a 2 and a −1 remain constant in the Parameterised DWT architecture for thousands of clock cycles. For example, in case of image processing, we will use the same a value for individual frame. Thus, the 13 multipliers used in the design can be replaced by reconfigurable look-up tables (LUTs) to allow fast arithmetic and more efficient implementation.
If the input is represented by B 1 bits and constant (a values) is represented by B 2 bits, we can use (B 1 + B 2 ) B 2 -input LUTs to get the output values of H 1 (k) and H 2 (k). Alternatively we can break down a (B 1 × B 2 ) bit multiplication into smaller input LUTs. Thus, the LUTs based multiplication can be reconfigured to incorporate any changes in encryption key (Pande and Zambreno, 2010) .
Arbitrary hardware multipliers can be implemented using the propagate and generate algorithm (Mano and Ciletti, 2006) . It is found that output is a function of inputs and is characterised uniquely by a logical expression which can be fit into a LUT. If one of the inputs (say B) is a constant, the output bit S i can be represented as a logic function of bit values of the other input A. The truth table of these functions f i (...) can be evaluated either by logical simplification or by exhaustive search over the input values. We can implement a M × K bit constant multiplication using (M + K) K-input LUTs. Next, we discuss the mapping of an M × K bit constant multiplier into 4-LUTs which are more freely available in commercial FPGAs.
Mapping a generic RCM into LUTs
The 
Hardware optimisations for ICO
A single DSP48E slice can perform a maximum of 25 × 18 bits multiplication and hence 12 slices are required for a 64 × 64 bits multiplication. Two multiplications require 24 DSP48E slices.
