Abstract-Neuron-machine interfaces such as dynamic clamp and brain-implantable neuroprosthetic devices require real-time simulations of neuronal ion channel dynamics. Field-programmable gate array (FPGA) has emerged as a high-speed digital platform ideal for such application-specific computations. We propose an efficient and flexible component-based FPGA design framework for neuronal ion channel dynamics simulations, which overcomes certain limitations of the recently proposed memory-based approach. A parallel processing strategy is used to minimize computational delay, and a hardware-efficient factoring approach for calculating exponential and division functions in neuronal ion channel models is used to conserve resource consumption. Performances of the various FPGA design approaches are compared theoretically and experimentally in corresponding implementations of the alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid (AMPA) and N-methyl-D-aspartate (NMDA) synaptic ion channel models. Our results suggest that the component-based design framework provides a more memory economic solution, as well as more efficient logic utilization for large word lengths, whereas the memory-based approach may be suitable for time-critical applications where a higher throughput rate is desired.
I. INTRODUCTION
R EAL-TIME simulation of neuronal ion channel dynamics is an important step in the implementation of neuron-machine interaction, which is fundamental to several emerging neuromorphic and biomimetic applications. For example, in electrophysiological studies of neuronal membrane properties using the dynamic clamp technique [1] , a digital computer is used to generate virtual ion channel conductances which continuously interact with a biological neuron in real time. Such software-based experimental applications are highly computation-intensive and often require judicious choice of operating systems [2] and/or numerical procedures [3] to improve the computational speed and flexibility. A hardware-based, application-specific implementation of the dynamic clamp technique would circumvent the limitations of general-purpose computers.
Another example of real-time neuronal ion channel dynamics computation is found in neuroprosthetic devices using brain-machine interface (BMI). For example, a robotic arm controlled by central brain activity has been shown to be capable of generating complex motions [4] , and such capability may find important applications in patients with Parkinson's disease, essential tremor, dystonia, multiple sclerosis, muscular dystrophy, and other motor dysfunctions [5] . Future applications of such neuroprosthetic devices might incorporate brain-implantable biomimetic electronics as chronic replacements for damaged neurons in central regions of the brain [6] - [8] . One such technology is neuromorphic analog very large-scale integration (VLSI) circuits [9] . Towards this end, we have previously proposed a neuromorphic Hebbian synapse design using analog complementary metal-oxide-semiconductor (CMOS) circuits operating in subthreshold regime [10] , [11] . However, the relatively long design and fabrication cycles for analog CMOS circuits can be a bottleneck in the development of such devices.
In recent years, field-programmable gate array (FPGA) technology has emerged as a high-speed digital computation platform [12] . The flexibility of the FPGA's programmable logic combined with its high-speed operation potentially allows it to control neuroprosthetic devices and dynamic clamp systems in real time. Additionally, FPGAs can be used to accelerate prototyping of analog hardware models of brain processes by quickly building a simulation platform to study the functional behavior of the proposed model in a much shorter design cycle. The applicability of FPGAs for neuronal ion channel dynamics simulation was first proposed in [13] and [14] . Specifically, FPGAs offer an advantage of high-speed signal processing which could be orders of magnitude faster than software-based approaches to simulating biological neuronal signals. This technology can potentially be an effective prototyping tool or permanent platform for dynamic clamp experiments or neuroprosthetic device applications.
However, neuronal ion channel models involve computationintensive functions which present a challenge for FPGA implementation. Currently, FPGA development toolkits do not always provide efficient built-in operators or basic building blocks for computation-intensive functions such as exponentiation and division. To circumvent this difficulty, a memory-based approach that stores the precomputed function values in lookup tables was adopted in previous implementations [13] and expounded upon in [14] - [16] . This simple approach has a number of limitations that call for a more flexible and hardware-economical design methodology.
In this paper, we propose a component-based FPGA design framework for the modeling and simulation of ion channel dynamics. Under this framework, computational algorithms for exponentiation and division are implemented using FPGA digital logics [17] instead of lookup tables. These FPGA basic arithmetic components provide the computational primitives for constructing any ion channel models readily. In Section II, the background of neuronal ion channel dynamics and the alternative FPGA design approaches are reviewed. The component-based FPGA design framework is presented in Section III, where two design alternatives for achieving maximum speed or minimum resource consumption are introduced. Section IV illustrates this design approach with component-based FPGA implementations of the N-methyl-D-aspartate (NMDA) and alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid (AMPA) excitatory synapse ion channel models. Section V concludes with a summary of the findings.
II. BACKGROUND

A. Neuronal Ion Channel Models
The current flowing through an ion channel at time can be thought of as an ohmic relationship with a time-varying conductance and membrane potential (1) where is the reversal potential of the th ion channel. The time-varying conductance can be chosen accordingly to model voltage-dependent and ligand-dependent ion channels [18] . Computationaly intensive functions, such as exponential function and division, are frequently used in modeling the time-varying conductance.
The ion channel dynamics can be modeled by using firstorder differential equation that captures the evolution of the membrane potential (2) where is the membrane capacitance, is the leak conductance of the membrane patch, is the resting membrane potential, and sums all the currents from different membrane ion channels.
B. Memory-Based Versus Component-Based Approaches
For the memory-based approach, key computational steps of the above neuronal models are precomputed and stored in the FPGA internal memory such that during playback these values can be retrieved readily at a high rate [14] , [16] . The ion channel equations are evaluated with predefined parameters and with a time increment that is indexed in the lookup table. There are two major advantages with memory-based model realizations: 1) small computational delay, and 2) design simplicity. However, the simplicity and high-speed memory retrieval come with a cost, in that the size of the precomputed lookup table grows exponentially with increasing accuracy and input resolution requirements. On-chip memory could be easily exhausted when dealing with large-scale neuronal models with multiple ion channels. [19] . Nevertheless, memory consumption is still a limiting factor when large-scale models with high resolution are involved.
In addition, with the memory-based approach it is difficult to change the model parameters in runtime. This is because all functions are precomputed in advance assuming predefined parameters. This is a limitation for certain applications such as dynamic clamps, where the ability to modify parameters at runtime is highly desirable.
For the component-based approach, basic model functions such as exponentiation and division are evaluated with computational components and implemented using digital logics. Several algorithms are available for mapping these functions to FPGA embedded logics to form basic components [20] , [21] . The requirement for on-chip memory can be greatly reduced with this approach. Further, model parameters can be adjusted and adapted in real time. This is because the model functions are evaluated directly at runtime using FPGA computational primitives. A drawback of the component-based approach is the latency overhead, but the computation speed can be improved by introducing parallelism to the calculations. Table I summarizes the differences between the memorybased and component-based approaches for neuronal ion channel model implementation. For the memory-based approach, large on-chip memory is required. For the component-based approach, there is a balance between memory and logic resources utilization. For computational speed, the memory-based approach is generally faster than the component-based approach but the latter has a greater flexibility in allowing optimal tradeoff between speed and resource utilization. For scalability, availability of hardware resources would be the limiting factor for the memory-based approach with respect to improving resolution and model complexity. Lastly, since the model simulation results from the component-based approach is directly evaluated in runtime, this approach is more adaptable to changing parameter settings.
III. COMPONENT-BASED APPROACH
A. FPGA Design Flow
FPGA systems can be designed with high-level programming languages. Several commercial design platforms and environments provide automatic compilation of high-level programming languages, such as C, C++ and Matlab into FPGA digital logics. Alternatively, FPGAs can also be programmed by using VHDL or Java under a low-level design scheme. Readers are directed to [22] for a detailed survey on FPGA design flow. Our logic-level design makes extensive use of Xilinx's system generator (SG) that works under MATLAB's Simulink environment [23] . The latter provides a schematic design environment of logic gate blocks which are used to implement the FPGA model. SG automatically synthesizes the Simulink model from the schematic design to a bit stream file, which can be readily used to configure the FPGA hardware. FPGA as a standalone simulation device can be properly interfaced with different kinds of software, such as MATLAB and LabView. Analog signals can be converted to digital by using an external analog-to-digital conversion (ADC) chipset before inputting to FPGA.
B. System Architecture
The basic equations (1)-(2) require iterative computations at each time increment , from with the given initial conditions. A common technique for realizing this time-increment requirement in FPGA design is to construct a counter block, to be triggered by an external ENABLE pulse (the rising edge of which defines the instant ). This approach can effectively mimic the time increment with little hardware needed. The trigger starts the counting at a frequency specified by the FPGA clock, and can be used to provide a steady stream of values of at regular intervals for subsequent calculations. Fig. 1 shows the data flow of an overall design with ionchannel compartments. We use a register to store the membrane voltage . The membrane leak conductance is regulated by the differences between the membrane potential and the rest potential. The ion channel blocks run in parallel and generate the net membrane current. The accumulator (1/s) effectively models the conversion of electric charge into a voltage, acting as a resistor-capacitor (R-C) circuit emulating the effects of and .
C. Implementation of Exponentiation and Division Using Factoring Approach
FPGAs offer programmable on-chip logics, embedded multipliers and interconnect fabric. There are no readily adaptable hardware resources for the exponentiation and division operations required for the neuronal ion channel models (1), (2) . Traditional software routines provide accurate evaluations of these functions given enough time for iterative operations. Most of these routines require repeated multiplicative operations, which are hardware resource expensive in digital logics. Alternatively, a hardware-efficient routine known as the factoring approach, or additive normalization, approximates the exponential and division functions successively with only shift-and-add operations [17] , [20] . Specifically, multiplication of , where is an integer, would become simple -bit shifting operations in digital logics, which are highly hardware-economical. Basic component or arithmetic units for exponentiation and division can be implemented with the factoring algorithm in a cost-effective manner (see the Appendix).
D. Maximum Speed Versus Minimum Resource Consumption
FPGA hardware resources can be configured by the designer in various ways. Software support with a full-spectrum library of different granular entities (e.g., basic logic gates, flip flops, counters, multiplexers, adders, multipliers) is now widely available, along with a high-level hierarchical design platform and graphic interface. This helps tremendously our component-based modeling approach with flexible design options to achieve either maximum speed or minimum resource consumption.
To take full advantage of available hardware resources, functional operators may be executed in parallel in order to gain greater speed. This is known as the maximum speed design [24] . The drawback of this approach is that extra hardware resources are required. Alternatively, execution in a sequential order can reduce the required number of hardware operators, as some of them may be used for different operations at different time stamps. This minimum resource consumption design option is useful when hardware resources are scarce or when implementing large-scale or complex ion channel models. Table II compares the advantages and disadvantages between the maximum speed and minimum resource consumption schemes.
IV. IMPLEMENTATION EXAMPLES AND TESTING RESULTS
In this section, we illustrate the above design principles with the FPGA-based modeling of two types of synaptic ion channels: NMDA and AMPA receptor-gated ion channels (Fig. 2) . Both ion channels are present in glutamatergic excitatory synapses and play an important role in synaptic plasticity such as long-term potentiation and depression [25] - [27] . AMPA channels, which carry the bulk of the excitatory synaptic current , can be modeled by using (1) and the time-varying conductance can be modeled by using an alpha function as follows [18] : (3) NMDA channels with excitatory current are the major source of influx into the postsynaptic cell. They have the interesting property that their gating is jointly controlled by neurotransmitter binding and by a voltage-dependent blockage of the channel by ions. This bivariate gating function is characterized by the membrane reversal potential and other parameters (4) where is the maximal channel conductance, and are the channel's activation and deactivation time constants, is the extracellular magnesium concentration, and are constants. Because of their unique dependence on both presynaptic and postsynaptic activation, NMDA channels are generally thought to be a biophysical implementation of the Hebbian adaptation rule [10] , [25] , [26] , [28] . 
A. Component-Based FPGA Implementations of APMA and NMDA Ion Channel Models
For convenience of analysis, the binary operators for addition/subtraction, multiplication, division via FPGA logic hardware implementations are denoted as , , , respectively; the unary operator for exponential is . Also, refers to the latency for multiplication and refers to the latency for exponentiation. Parallelization of independent processes and is denoted as , such that processes and are independent. Computing : Using the above notations, (1), (3) may be rewritten as (5) (6) where denotes the th independent process, which can be realized by using dedicated hardware resource. Fig. 3 shows the maximum speed and minimum resources consumption hardware mapping strategies for the implementation of (5) and (6) . For the maximum speed design, multiplications are parallelized to obtain the speedup. Alternatively, multiplication operators may be time-shared to conserve hardware resources. It is the designer's discretion to decide which approach is appropriate.
Computing : Reformulating (4) gives 
Table III compares the computational delay and resource requirements for the maximum speed and minimum resource consumption implementations of and . Table IV shows the experimental results for the tradeoffs between computational delay and resource consumption for these two design alternatives implemented on the Xilinx Virtex-II FPGA. The metric for computational delay is the number of clock cycles required to finish the task. For modern FPGAs, such as the Xilinx Virtex-II, dedicated multipliers and RAM are also embedded into the FPGA fabric, which can significantly improve the computational efficiency. Therefore, counts of these three important hardware resources, 1) slice (or logics), 2) RAM, and 3) multipliers become the metric of resources evaluation. The experimental results for the computational delay and resources binding for the memory and multipliers are generally in good agreement with the theoretical bounds. However, the number of binding multipliers are double that expected from the above theoretical analysis. This is because the evaluation of exponential functions itself requires a multiplier, which is not counted in the theoretical analysis. Addition and division are realized by using logic slices. It can be seen that the NMDA circuit requires substantially more logic units than the AMPA circuit, as predicted by the theoretical bound. Further, the computing speed for the maximum speed design is double the other design, while most of the logic slices requirements are 87% more than the other design.
B. Resources Utilization of Memory-Based and Component-Based FPGA Designs
Both the memory-based and component-based approaches require lookup tables of certain sizes which are mapped into FPGA internal memory, such as block RAMs (BRAMs). It is possible to use the number of BRAMs being occupied as a metric for memory consumption. However, the size of BRAMs varies for different FPGAs and technologies. For example, the Xilinx Virtex-II XC2V2000 has a total of 1008 kb memory and each BRAM has 18 kb. Greater BRAM capacity is now available for more powerful FPGAs such as Virtex-5 XC5VLX330, the latest from Xilinx's, with 10368 kb. Alternatively, we can directly compare the sizes of the lookup tables in bits for each implementation use.
The size of a lookup table is dependent on both the word length (number of bits) of the output signal and the required word length precision of the input . Both and may directly affect data precision and the quality of the computational results. We compare the sizes of lookup tables required for the two approaches by varying and . The comparison results are presented in Tables V and VI, in which the numbers are in bits. The memory-based approach requires a much larger lookup table. The table size is two to three orders of magnitude more than the component-based approach. Also, its lookup table size increases exponentially with increasing . This may be a problem for a moderate-size FPGA, i.e., Xilinx Virtex-II, which can only support a design with up to 14 bits and up to 12 bits, by exhausting all memory. In contrast, for the component-based approach the lookup table size increases linearly with both and . In this case, memory will not become a bottleneck as in the memory-based design.
Another important on-chip resource is logic gates, each of which is realized using a four-input look- up table known as TABLE VI  COMPARISON OF MEMORY CONSUMPTION FOR MEMORY-BASED slices. In contrast to lookup tables, the logic gates consumption is mainly dependent on the word length. The results are shown in Fig. 4 . The memory-based approach consumes less logic slices than the component-based approach for word lengths smaller than 16. This is because more primitive logics are needed to realize the algebraic components in the component-based approach, whereas logics are required only for memory retrieval and proper scaling for the output in the memory-based approach. However, when the word length is longer than 16 bits (for the case of AMPA) and 14 bits (for the case of NMDA), logic slices consumption grows explosively for the memory-based implementation. This is because to achieve such word length accuracy, a large lookup table is needed. A total of 256 and 1024 BRAMs are required for the AMPA and NMDA circuit, respectively. A large amount of logic units are also needed to construct the communication bus architecture and the necessary data representation conversion. For word lengths over 24 bit, the memory requirement is 1 Mb, which represents a significant portion of FPGA embedded memory capacity currently available. On the other hand, logics usage for component-based approach increases linearly with increasing word length.
A well-known drawback of the component-based approach is the computation delay overhead. In the present design framework, the exponentiation and division operations require only addition and shifting operations, which are relatively economical time-wise. Still, when compared to the memory-based approach, which directly retrieves precomputed results form RAM, the maximum speed component-based approach takes a longer time as reflected by the number of clock cycles needed (Table VII) . The computational delay for the component-based approach depends on the word length, as more iteration loops are needed for higher accuracy requirement. In contrast, memory retrieval time is not necessarily dependent on bit accuracy, as all signals are retrieved in parallel. Results show that the memory-based approach could be 3-4 times faster than the component-based approach.
Specifically, since the FPGA is driven by a global clock signal, the computational speed of the model is dependent on the frequency of the digital clock and the implementation strategy for the model equations. The amount of time it takes to evaluate the model equations can be expressed as , where is the number of clock cycles needed. For a Xilinx Virtex-II XCV2000 FPGA, the maximum frequencies for the memory-based and component-based approaches are 70 and 68 MHz, respectively, due to differences in signal flow critical paths. The model using the memory-based approach requires 4950 clock cycles to generate an output, which means that each presynaptic pulse generates a postsynaptic signal in 70. Overall, the component-based model provides a more memory economic solution, as well as more efficient logic utilization for large word lengths ( 16) . Although there is a cost in computational delay for the arithmetic evaluations, the savings in hardware resources may be critical for model scaling. Thus, multiple ion channel models may be simulated in parallel on one chip without overshooting either the memory or logic slice limit. In contrast to multiplexing, which switches different inputs and outputs and reuses the same logics, parallel implementation could better utilize the available input/output ports for higher performance. On the other hand, the memory-based approach may be suitable for time-critical applications where a higher throughput rate is desired.
C. Simulation Results
Fig . 5 shows the comparison between biological recordings (EPSCs) mediated by either NMDA channels alone or both AMPA and NMDA channels in a cultured hippocampal neuron from a neonatal rat. The FPGA simulation results from the memory-based approach and the maximum speed component-based approach both adequately mimicked in real time the actual NMDA and AMPA currents from the experimental data in [29] . Both memory-based and component-based approaches adopt a 16-bit output resolution. Especially, memory-based is with 8-bit addresses resolution. However, close-up views of both simulation curves [ Fig. 5(c) and (d) ] reveal significant truncation errors from the memory-based approach (but not the component-based approach) when compared to a reference curve generated by using a simulation software with floating point precision on a digital computer. The truncation errors are Next, we  used double precision floating point arithmetic as a reference TABLE VIII  TABLE-LOOK-UP (LUT) AND FACTORING (FACT) APPROACHES to systematically evaluate the relative errors of both FPGA approaches, defined as the normalized errors averaged ( standard deviations) for the first 5000 time stamps of the simulation runs for: 1) component-based approach (Comp) with 16-bit input and output; 2) memory-based approach (Mem8) with 8-bit memory address and 16-bit output; 3) memory-based (Mem14) with 14-bit memory address and 16-bit resolution. Fig. 6 shows the relative errors of the three implementations for varying and , which are parameters in (3) and (4), respectively. For or the relative errors of Mem8 and Mem14 are much larger than Comp. The relative error of Comp increased with but decreased with . This is because when increases, the AMPA current will decrease and hence the relative error will increase for a given truncation error, whereas the NMDA current increases with . For Mem8, there were large variances in the relative errors for the AMPA simulation but not the NMDA simulation, the reason being that AMPA current is very sensitive to while NMDA current is less sensitive to .
V. CONCLUSION
A component-based FPGA design framework for digital simulation of neuronal ion channel dynamics has been presented. The FPGA realization computes with comparable accuracy as digital computer implementations while operating at much higher speeds. The programmability and high-speed capability of the FPGA system allow it to prototype analog circuit designs with a much shorter design cycle. The proposed component-based design strategy overcomes the inflexibility and memory limitations of the memory-based design approach. This technology can potentially be used as a valuable tool for dynamic clamp experiments or for controlling neuroprosthetic devices, or for chronic replacement of damaged neurons in central regions of the brain in future.
APPENDIX
A. Exponentiation Using Factoring Approach
Consider exponentiation function , which can be approximated by product of factors with judicious choice of , such that (10) where and .
Suppose we have a The input domain of factoring algorithm assumes that . It is necessary to reformulate the exp function to fit the input number range of the neuronal modeling. For the ion channel models, the input for exp function is , which is a negative number. A method is needed to transform the input into an acceptable domain for factoring algorithm.
By rearranging the index term, the exp function can be formulated as a product of two terms. One of the terms can be simply evaluated by multiplications of constant and the other is with input at the acceptable domain of factoring algorithm. We let equals to , where , are positive integers. Therefore, we have (11) Since is an integer, the first term, can be evaluated by simple multiplications. The second term can be evaluated by using factoring algorithm, as is a positive number less than one.
In addition, the comparison between lookup-table and factoring approaches for is shown in Table VIII . The mean and standard deviation of their corresponding normalized error are shown that the floating point exponentiation function from Matlab is used as a reference. The column address is the memory addresses resolution in bits and the iteration column is the number of operation cycles of the algorithm.
B. Division Using Factoring Approach
Factoring is also an effective approach for division approximation, such as [17] . Division can be considered as a reciprocal computation, as reciprocal is a special case of division with . Similar to the exponentiation, division can be formulated as product of factors as follows: (12) where . 
