




The Thesis committee for Thomas Charles Lauzon
certifies that this is the approved version of the following thesis:











Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
MASTER OF SCIENCE IN ENGINEERING
THE UNIVERSITY OF TEXAS AT AUSTIN
December 2009
Dedicated to my parents, Véronique and Charles and my sister Catherine.
Suitability of FPGA-based Computing for
Cyber-Physical Systems
Thomas Charles Lauzon, M.S.E.
The University of Texas at Austin, 2009
Supervisors: Derek Chiou
Aloysius Mok
Cyber-Physical Systems theory is a new concept that is about to rev-
olutionize the way computers interact with the physical world by integrating
physical knowledge into the computing systems and tailoring such computing
systems in a way that is more compatible with the way processes happen in
the physical world. In this master’s thesis, Field Programmable Gate Arrays
(FPGA) are studied as a potential technological asset that may contribute to
the enablement of the Cyber-Physical paradigm. As an example application
that may benefit from cyber-physical system support, the Electro-Slag Remelt-
ing process - a process for remelting metals into better alloys - has been chosen
due to the maturity of its related physical models and controller designs. In
particular, the Particle Filter that estimates the state of the process is stud-
ied as a candidate for FPGA-based computing enhancements. In comparison
with CPUs, through the designs and experiments carried in relationship with
this study, the FPGA reveals itself as a serious contender in the arsenal of
v
computing means for Cyber-Physical Systems, due to its capacity to mimic
the ubiquitous parallelism of physical processes.
Keywords: Cyber-Physical Systems, Particle Filter, Electro-Slag Remelt-




List of Tables x
List of Figures xi
Chapter 1. Introduction 1
1.1 Research Context . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Cyber-Physical Systems . . . . . . . . . . . . . . . . . . 1
1.1.2 Plant Model . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Specific Applications: Gas Metal Arc Welding and Electro-
Slag Remelting . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 State Estimation for the Electroslag Remelting Process . 4
1.1.5 Particle Filtering Applied to Electroslag Remelting . . . 4
1.1.5.1 Overview of Particle Filtering . . . . . . . . . . 5
1.1.5.2 Theoretical Background . . . . . . . . . . . . . 5
1.1.5.3 Sampling Importance Resampling - Presentation 7
1.1.5.4 Sampling Importance Resampling - Algorithm . 9
1.2 Problem to Solve . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Preliminary Study: Influence of the Quantity of Particles on the
Precision of the Estimate . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Experiments, Analysis and Results . . . . . . . . . . . . 11
1.3.1.1 Evaluation Means . . . . . . . . . . . . . . . . . 11
1.3.1.2 Experimental Setup . . . . . . . . . . . . . . . . 11
1.3.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1.4 Conclusion . . . . . . . . . . . . . . . . . . . . 17
1.3.2 Summary of the Problem to Solve . . . . . . . . . . . . 18
1.4 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
vii
Chapter 2. Study of the Particle Filtering Algorithm as Applied
to Electro-slag Remelting 21
2.1 Particle Filter Algorithm for the Electro-slag Remelting Process 21
2.2 Parallelization Potential . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 Physical Model . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.2 Measurement Model . . . . . . . . . . . . . . . . . . . . 28
2.2.3 Likelihood Function . . . . . . . . . . . . . . . . . . . . 29
Chapter 3. Implementation 30
3.1 CPU implementation in C . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 FPGA Implementation . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 FPGA technology . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 ESL tool: Bluespec System Verilog . . . . . . . . . . . . 34
3.2.3 FPGA Implementation Techniques . . . . . . . . . . . . 35
3.2.4 Mixed control and data flow chart of the Algorithm . . . 36
3.2.4.1 Physical Model (F) . . . . . . . . . . . . . . . . 36
3.2.4.2 Measurement Model (H) . . . . . . . . . . . . . 39
3.2.4.3 Whole Simulator . . . . . . . . . . . . . . . . . 40
3.2.5 Customized Operators . . . . . . . . . . . . . . . . . . . 41
3.2.5.1 Division . . . . . . . . . . . . . . . . . . . . . . 42
3.2.5.2 Inversion . . . . . . . . . . . . . . . . . . . . . . 46
3.2.5.3 Inversion Farm . . . . . . . . . . . . . . . . . . 47
3.2.5.4 Exponential . . . . . . . . . . . . . . . . . . . . 49
3.2.6 FPGA Implementation Toolchain . . . . . . . . . . . . . 51
3.2.6.1 Software Versions . . . . . . . . . . . . . . . . . 52
Chapter 4. Implementation Results and Analysis 53
4.1 Implementation Results . . . . . . . . . . . . . . . . . . . . . . 53
4.1.1 CPU Results . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.2 FPGA Results . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Speed and Number of Devices . . . . . . . . . . . . . . . 56
viii
4.2.1.1 CPU Speed . . . . . . . . . . . . . . . . . . . . 57
4.2.1.2 Number of CPUs Required for a Given Perfor-
mance Level . . . . . . . . . . . . . . . . . . . . 58
4.2.1.3 FPGA Speed . . . . . . . . . . . . . . . . . . . 58
4.2.1.4 Number of FPGAs Required for a Given Perfor-
mance Level . . . . . . . . . . . . . . . . . . . . 60
4.2.2 FPGA v. CPU performance outlook . . . . . . . . . . . 60
4.2.2.1 Speeds . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.2.2 Number of Devices . . . . . . . . . . . . . . . . 63
4.2.2.3 Performance Analysis . . . . . . . . . . . . . . . 65
Chapter 5. Conclusion and Future Work 67
5.1 Analysis of Results . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Appendix 71
Appendix 1. Bluespec Implementation Code 72
1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
1.2 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
1.2.1 Types.bsv . . . . . . . . . . . . . . . . . . . . . . . . . . 72
1.2.2 Params.bsv . . . . . . . . . . . . . . . . . . . . . . . . . 75
1.2.3 inverse.bsv . . . . . . . . . . . . . . . . . . . . . . . . . 78
1.2.4 invFarm.bsv . . . . . . . . . . . . . . . . . . . . . . . . 82
1.2.5 ExponFix.bsv . . . . . . . . . . . . . . . . . . . . . . . . 86
1.2.6 HBRAM.bsv . . . . . . . . . . . . . . . . . . . . . . . . 91
1.2.7 Fpiped.bsv . . . . . . . . . . . . . . . . . . . . . . . . . 98
1.2.8 FHpiped.bsv . . . . . . . . . . . . . . . . . . . . . . . . 111
1.2.9 FHpipedTb.bsv . . . . . . . . . . . . . . . . . . . . . . . 117
1.2.10 MultipleFH.bsv . . . . . . . . . . . . . . . . . . . . . . . 121





4.1 FPGA Specifications . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Xilinx ISE Project Configuration . . . . . . . . . . . . . . . . 54
4.3 FPGA Resource Utilization . . . . . . . . . . . . . . . . . . . 55
x
List of Figures
1.1 Basic Form of a Manufacturing Process. . . . . . . . . . . . . 3
1.2 The Particle Filter: recursively improving the estimate of the
posterior distribution by selecting the most important particles
for the next iteration . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 RMSE ! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 RMSE Ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 RMSE d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 RMSE Xram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 RMSE Me . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 General Form of the Particle Filter Based Estimator. . . . . . 24
2.2 Particle Filter with Multiple Sampling Stages in Parallel . . . 26
3.1 Physical Model Flow Chart . . . . . . . . . . . . . . . . . . . 37
3.2 Full Physical Model Flow Chart . . . . . . . . . . . . . . . . . 38
3.3 Physical Model Flow Chart - Inversion Excluded . . . . . . . . 39
3.4 Measurement Model Flow Chart . . . . . . . . . . . . . . . . . 40
3.5 Whole System Flow Chart . . . . . . . . . . . . . . . . . . . . 41
3.6 Inversion Farm Module . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Computing the powers of x for the Taylor series expansion of
the exponential function . . . . . . . . . . . . . . . . . . . . . 51
4.1 Particle Computation Rates for 4 CPU Cores and 1 FPGA
based simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Particle Computation Rates for 4 CPU Cores and 2 FPGA
based simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Number of Devices Required; 4 Cores per CPU v. 1 Simulator
per FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Number of Devices Required; 4 Cores per CPU v. 2 Simulator






The context of this research is the solution space exploration for the
implementation of algorithms used in cyber-physical systems. Cyber-physical
systems are the combination of control, information technology and commu-
nication. In such systems, timing is critical and processes evolve concurrently.
Examples of systems that will benefit from this technology, some of which
proposed by Lee [8], are:
Transportation
Automated shipyard management, advanced automotive systems, avion-
ics, air tra"c control;
Medical
Robotic surgery, assisted living, patient monitoring systems;
Utilities and infrastructure






Process control, resource management;
Civil Engineering
Structure health monitoring, robotics assisted construction, smart struc-
tures.
1.1.2 Plant Model
The physical elements in a cyber-physical system may be modeled in
two parts: a physical model, f and a measurement model, h. The physical
model is a set of equations that describe the dynamics of the system, whereas
the measurement model is a set of equations that describe the link between
the system’s state and the sensor signals. In most complex (i.e. high-fidelity)
models these equations are often non-linear.
The physical model outputs a new state based on the command signals
and the current state. Usually, noise is present in the command signals. The
measurement model outputs a new observation vector from the new state from
the physical model. The observation vector also contains measurement noise.










Physical Model Measurement ModelCommand Vector Observation Vector
Command Noise Measurement Noise
xk
xk+1
Figure 1.1: Basic Form of a Manufacturing Process.
1.1.3 Specific Applications: Gas Metal Arc Welding and Electro-
Slag Remelting
As an example of a process that can benefit from the Cyber-Physical
Systems paradigm, our research group has selected the Gas Metal Arc Welding
(GMAW) process. This process is similar to a previously studied process called
Electroslag Remelting (ESR.) The study of the ESR was performed by Ahn for
his Ph.D. [1] . Although the speed of the GMAW process is much faster than
the ESR, it is assumed that the general forms of the control and observation
algorithms are similar. This research is therefore a follow-up of Ahn’s research.
3
1.1.4 State Estimation for the Electroslag Remelting Process
One particular issue that arises in plants such as the ESR is the pres-
ence of noise on both the control signals and the observation signals. This is
problematic for the purpose of controlling the process, since the state of the
system may be quite di#erent from the raw observations. If these raw obser-
vations were to be used directly, this would lead to improper process control.
To resolve this problem, the state has to be inferred by a state estimator.
Due to the non-linear nature of the high-fidelity stochastic physical
and measurement models of the ESR process, one estimation algorithm un-
der investigation is the Particle Filter also known as Sequential Monte Carlo
Methods (SMC.) Ahn’s [1] study of estimation methods that inlcuded the Lin-
ear Kalman Filter, the Extended Kalman Filter and the Unscented Kalman
Filter and the particle filter, has shown that although Kalman Filters provide
good estimates in the Unscented Kalman Filter, the accuracy of these esti-
mates doesn’t match the accuracy of the Particle Filter for the ESR in terms
of root mean squared error. This is due to the fact that if well designed and if
the number of samples is large enough, the estimates tend toward the optimal
Bayesian estimate.
1.1.5 Particle Filtering Applied to Electroslag Remelting
Particle Filtering is also known as Sequential Monte Carlo Methods. A
detailed explanation of Particle Filtering would be beyond the scope of this
document.
4
1.1.5.1 Overview of Particle Filtering
To understand the concept behind Particle Filtering, one may see each
particle as a possible (or hypothetical) physical state of the plant. Additionally,
each particle has a weight that represents how likely the plant may be in that
state. The weights are normalized such that the sum of all the weights is equal
to one. The particles with the heaviest weights are the ones that are most likely
to be equal to the real plant state. After applying a certain command vector
to the plant, simulations are done using the particles as initial states. These
simulations provide new hypothetical states. The likelihood of each of these
simulated states is obtained by further simulating their measurements and
comparing their simulated measurements with the real measurements made
on the real plant. The most likely states (i.e. the heaviest particles) are
retained for the next round of simulations, hence the filtering. Ultimately,
after a couple rounds, the particles should converge to the real state of the
plant if the estimator is well designed.
1.1.5.2 Theoretical Background
The basic objective of Sequential Monte Carlo Methods is to estimate
recursively in time the posterior distribution of a random variable and the
expectations for some function of interest.
In the case of cyber-physical systems, the random variable that needs its
posterior to be estimated is the current state of the physical system. Knowing
the current state is necessary to command the system to the next step on the
5
trajectory that will lead the system into a desired final state. Typically, this
signal is not directly available. It needs to be inferred from sensors, that may
or may not directly sense the values that represent the state of the physical
system. Furthermore, these sensed values often include significant noise.
Formally posed, the problem is stated as follows: [5]
The unobserved signal {xt; t ! N}, xt ! !, is modeled as a Markov
Process of prior distribution p(x0) and transition p(xt|xt!1). {yt; t ! N"},
yt ! $ is a set of independent observations of the process with marginal
distribution p(yt|xt).
If we define x0:t = {x0, . . . , xt} and y1:t = {y1, . . . , yt} being all the






















The probability density functions can be discretized.
For further details, the reader may refer to Sequential Monte Carlo
Methods in Practice by Arnaud Doucet, Nando De Freitas, Neil James Gordon,
Neil Gordon. [5] The following description comes from this source.
1.1.5.3 Sampling Importance Resampling - Presentation
As stated in [2] the key idea behind Carlo Methods is to represent the
posterior density function by a set of random samples with associated weights
which can then be used to compute estimates. In [2], the observation vector
y is called z. In the literature, we find both uses, although usually y stands
for the output, whereas z is the observation, which are often the same, but
not necessarily. For instance y could be the raw data coming out of sensors,




















Figure 1.2: The Particle Filter: recursively improving the estimate of the
posterior distribution by selecting the most important particles for the next
iteration
The Sampling Importance Resampling (SIR) algorithm is the version
that is used to for the Electroslag Remelting estimator. As the name suggest,
it is composed of three steps:







Importance This is the computation of the samples weight using equation
(1.6)
Resampling Which eliminates particles with small weights to concentrate on
the most important one in order to reduce the sample impoverishment
e#ect.
8
At each step, n simulations are done by adding noise to to the physical
and measurement models. The likelihood of each observation is then
computed based on the real plant observation. The particles that gen-
erated the most likely observations are kept and replicated for the next
round of simulations. The number of replicates depends on the impor-
tance (high likelihood) of the particle.
1.1.5.4 Sampling Importance Resampling - Algorithm

















• For i = 1 : N











• For i = 1 : N










































1.2 Problem to Solve
The initial problem we set ourselves to solve was the problem faced
by Ahn. His implementation of the Particle Filter was unable to compute a
su"cient amount of simulations under the process control deadline. Without
studying the problem in detail, we opined that FPGA or ASIC would be a
potential answer to this computation speed problem.
1.3 Preliminary Study: Influence of the Quantity of
Particles on the Precision of the Estimate
An intuitive idea is that to improve the state estimates one only needs to
increase the number of simulations. As a preliminary study, the influence of the
number of particles on the precision of the estimates was studied in order to set
a performance goal for the ESR estimator. The result of this study is that there
is a limit to the maximum precision that can be achieved. In other words, that
mean that there is a threshold after which increasing the number of particles
has no e#ect on the precision of the estimates. Furthermore, a reassuring
result is that increasing the number of particles (with the given models) always
reduces the error of the estimate. While some state variable estimates are
poorly sensitive to the number of particles others steadily improve up to 200
10
particles. After 200 particles, the improvements in precision are negligible (for
this set of physical and measurement model.
1.3.1 Experiments, Analysis and Results
1.3.1.1 Evaluation Means
In order to evaluate the performance of the estimator, the root mean
squared error (RMSE) between the true state of the plant (which is directly
available from the plant model) and the estimated state is computed for each






The estimator was tested in open loop configuration with no controller.
The process and measurement noise were from a gaussian distribution with
parameters that were measured by Ahn at the plant.
1.3.1.2 Experimental Setup
These results were computed using the C implementation. Using the
original implementation under Matlab would yield the same results.





This simulation was run over 1000 seconds divided in 7500 steps and
repeated 10 times for statistical results.
The RMSE was computed for di#erent numbers of particles for each
state variable.
1.3.1.3 Results
The results are found in figure (1.3) to (1.7)
12
Figure 1.3: RMSE !
13
Figure 1.4: RMSE Ts
14
Figure 1.5: RMSE d
15
Figure 1.6: RMSE Xram
16
Figure 1.7: RMSE Me
1.3.1.4 Conclusion
From these experiments it is concluded that 200 particles is the thresh-
old after which the estimates have reached their maximum precision.
17
1.3.2 Summary of the Problem to Solve
Based on the preliminary experimental results and the fact that the
given sampling rate is 133 ms the goal to achieve is the computation of 200
particles under 133ms.
1.4 Thesis Statement
Based on the fact that the physical world is ubiquitously parallel, i.e.
processes occur concurrently, it seems that parallel computing devices would
be the best suited devices for real-time inline simulation of the physics of
a cyber-physical system. Supporting this argument is the fact that modern
estimation techniques, such as Particle Filtering, make use of independent
probabilistic sampling. Since the sampling is independent, it can be done
in parallel, bolstering the use of parallel devices, such as FPGAs. Although
modern CPUs are now multicore and work at Gigahertz frequencies, FPGAs
appear to be a better platform due to their intrinsic parallel architecture.
An FPGA based solution would take advantage of the fact that not
only the physical simulations done in Particle Filtering are independent and
can be done in parallel but also of the fact that the physical processes within
the plant occur concurrently themselves. As a result, a significant amount of
parallelization is possible, for which FPGAs are well suited.
18
1.5 Objectives
The objective is to figure out how to evaluate the performance of com-
puting devices for the purpose of assisting the task of designing cyber systems
that provides adequate computing power for controlling a specific physical pro-
cess. We have identified and selected three types of computing devices that
di#er in nature and present potential applicability to the domain of cyber-
physical systems. The first type is the Central Processing Unit (CPU) type,
the second is the Graphical processing Unit (GPU) type (not discussed in
this Master’s Thesis) and the last one is the Field Programmable Gate Array
(FPGA) type. These types may be characterized by the parallelism present in
their computing architecture. CPUs present the lowest order of parallelism.
Current CPUs on the market typically o#er 4 or 8 fast cores that can be used
in parallel. In contrast, GPUs provide a higher number of parallel processing
units, in the order of the hundreds. On the other end of the spectrum, FPGAs
are inherently parallel devices which can be tailored to concurrently process a
large number of elementary operations.
1.6 Approach
Using a well known computer controlled manufacturing process, Electro-
slag Remelting (ESR), an application that is mature for experimentation, we
designed an FPGA based solution and compared its performance with that of a
reference CPU based implementation. From both implementation we derived
computation speed models that give a macro level view of the performance
19
attainable on both platforms.
20
Chapter 2
Study of the Particle Filtering Algorithm as
Applied to Electro-slag Remelting
In this chapter, the algorithm is described in the special case of the
Electroslag Remelting process. After this, the proposed method for acceler-
ating the algorithm through the use of parallelization is presented, followed
by a study of the data and data types and how this is a potential barrier to
implementation and speedup potential
2.1 Particle Filter Algorithm for the Electro-slag Remelt-
ing Process
In the previous chapter, the general form of the particle filter has been
introduced. However, this form is somewhat abstract. In fact, the question
is “how do we draw particles?” As said earlier, the particles are drawn from
a proposal distribution q (xk|xk!1, zk). Since this distribution is di"cult to
obtain, the transition prior q (xk|xk!1) is chosen as a proposal distribution.
This distribution is only dependent on the previous state x. In many cases
this is acceptable.
We would like to be able to sample from the transition prior by run-
21
ning a particle through the physical model and adding process noise to the
new state. The problem is that it is di"cult to know the parameters of the
distribution of x since we cannot measure it directly by experiment. Instead,
particles are drawn by adding noise to the control vector, u, running those
noisy input values through the physical model and the measurement model,














where: x(i)k is the particle ( i) at step k, f is the physical model, uk is
the command vector at step k and mk is the generated process noise vector
at step k for particle i. This noise may come from any distribution.
The next step is to evaluate the weights. This is done recursively by
multiplying the previous weights by the likelihood of each observation. The
likelihood function in the case of the ESR is a gaussian.
For each particle, the simulated observation is:
zk = h (xk) + nk (2.2)
where: h is the measurement model, xk is the drawn state and nk is
the measurement noise vector. The noise vector is random and di#erent for
each particle. As can be seen from this equation, the observation is a direct
function of the state and the noise.
22




















Rr is the covariance matrix, zk is the observation from the plant at step






w(i)k is the weight of particle i at step k, w
(i)
k is the weight of particle i
at step k-1.
Once all the weights have been computed, they must be normalized so
that the sum of the weights is equal to 1.
The last part is the resampling of the particles. This part is not covered
in detail in this document in detail as it is not the target of this study. The
basic mechanism is that the particles that are the most important (highest



















Figure 2.1: General Form of the Particle Filter Based Estimator.
2.2 Parallelization Potential
One specificity of the Particle Filter is that the sampling part can be
parallelized since each particle is drawn independently from another. It is sup-
24
posed that the sampling part may be accelerated through parallelization on
an FPGA for example. Not only can the sampling of each particle be done in
parallel, many of the internal computations, which are the physical and mea-
surement models of the concurrent processes, can also be done in parallel. As
seen in the estimator diagram, the likelihood function can also be computed in
parallel. But the parallelization potential stops here, as the next step (normal-
ization) requires all the weights from all the particles. Finally, the resampling
may potentially benefit from internal parallelization, but this study has not
been done yet and it is believed at this point that the speed improvement from
this parallelization will not be as significant as the parallelization of the sam-
pling stage, the mechanism of which remains the same independently of the
physical and measurement models used. The equations for the models come





















Figure 2.2: Particle Filter with Multiple Sampling Stages in Parallel
2.2.1 Physical Model
The physical model is a set of di#erential equations that describe the









































ṡ(!, Ts) = #!rCs,!! +
Csp pm,r(Ts)
hm
pm,r(Ts) = (1 + µr)
Qm(Ts)
Ae
Qm(Ts) = HeAe(Ts # Tm)
Qs(Ts) = Hs2$rihs0(Ts # Tss)
pin,r(Ts, d, Ic) = V olt(Ts, d, Ic)Ic
V olt(Ts, d, Ic) = R(Ts, d)Ic
R(Ts, d) = Rd(d)e( # Aelect(Ts # Ts!)
Rd(d) =
;
R1 # m0d d < dinflection
R1 # m1d d & dinflection
While it is important for the the design of these models to know what
all these names mean, for the problem of studying the computation what they
stand for is insignificant. It is however important to realize that all the values
that are not part of the state vector x or the command vector u are constants,
i.e. the state vectors contain all the variables of these equations. All other
values are parameters that can be pre-computed.
The discrete version is obtained by integrating f(x,u) over the sam-
pling period T.
xk+1 = xk + f (xk,uk) .T (2.6)
27
Finally, the command noise mk is added to the command vector, yield-
ing:
xk+1 = xk + f (xk,uk + mk) .T (2.7)
2.2.2 Measurement Model
In the case of the electroslag remelting model, the measurement equa-
tions are rather simple. The output vector y contains the penetration depth
d, the ram position Xram, the current Ir, the weight read on the load cell LC
and the voltage Volt



















R1 # m0d d < 0
R1 # m1d d > 0
LC =
;
Me # #sAed d > 0
Me otherwise
V olt = RIc
Finally, the measurement noise nk is added to the command vector,
yielding:
yk = h (xk) + nk (2.9)
28
2.2.3 Likelihood Function
The weights are finally computed through the use of the likelihood










A working version of the estimator for the ESR process has been written
under Matlab by Ahn. Both CPU and FPGA perform the same functions
as this Matlab code. In fact the Matlab results were used to validate both
implementations. The CPU implementation was done in C, whereas the FPGA
implementation was done using Bluespec System Verilog.
3.1 CPU implementation in C
As discussed in the introduction of this chapter, the CPU implemen-
tation was done in C. The primary reason for this choice is the fact that
many modern real-time control systems using CPUs use this language due to
its simplicity and its robustness. Although C doesn’t natively provide many
types (e.g. no fixed-point type, which can be useful for speeding up some of
the computation,) C natively supports double precision data types and comes
with many mathematical functions such as the exponential function, which is
required to compute the resistance values in the ESR’s models. In order to
verify that the C code performs the same tasks as the Matlab code, the Process
and Measurement noise have been stored in text files that are both accessible
30
by Matlab and C code. In the end it was verified that the C implementation
provides the same results as the original Matlab code.
3.1.1 Code
Only the physical model, f, and the measurement models are included,
since we are only studying the sampling part.
vo id f (
/'IN :'/
double x [ SIZEOFX] ,
double u [ SIZEOFU] ,







double tde l t a ,Rd,R, Volt ,P,Qm,pm, Qs ,Vram,
. . . Sdot , de l tadot , Tsdot , ddot , Xramdot , Medot ;
t d e l t a=t s ;
i f (X D<d I n f l e c t i o n )
Rd=R1#m0'X D;
e l s e
Rd=R1#m1'X D;
R=Rd'exp(#Aelect '(X TS#Tsstar ) ) ;
' I=U IC+Ib+no i s e [ 0 ] ;
Volt=R' ' I+Voltb ;




Qs=Hs'2'M PI' r i 'hs0 '(X TS#Tss ) ;
Vram=UVRAMC+Vramb+no i s e [ 1 ] ;
Sdot=#alphar 'Csd0/X DELTA+Csp'pm/hm;
// ra t e equat ions
de l t adot = alphar 'Cdd/X DELTA#Cdp'pm/hm;
Tsdot = (P#Qm#Qs) / rhos /Vs/Cs0 ;
ddot = alphar 'Csd0/X DELTA # Csp'pm/hm + Vram/a ;
Xramdot = Vram;
Medot = #rhom'Ae'Sdot ;
//Output
newX [ 0 ] = X DELTA + de l tadot ' t d e l t a ;
newX [ 1 ] = X TS + Tsdot' t d e l t a ;
newX [ 2 ] = X D + ddot' t d e l t a ;
newX [ 3 ] = X XRAM + Xramdot' t d e l t a ;




double x [ SIZEOFX] , double u [ SIZEOFU] ,
. . . double no i s e [ SIZEOFY] ,
/'OUT'/
double y [ SIZEOFY]
)
{
double Rd, Ir ,R, Volt ,LC;
i f (X D<0)
Rd = R1#m0'X D;
e l s e
Rd = R1#m1'X D;
I r = U IC + Ib ;
R = Rd'exp(#Aelect '(X TS#Tsstar ) ) ;
32
// est imated r e s i s t a n c e
Volt = R' I r + Voltb ;
i f (X D>0)
LC = X ME # rhos'Ae'X D;
e l s e
LC = X ME;
y [0 ]=X D+no i s e [ 0 ] ;
y [1 ]=X XRAM+no i s e [ 1 ] ;
y [2 ]= I r+no i s e [ 2 ] ;
y [3 ]=LC+no i s e [ 3 ] ;




Field Programmable Gate Arrays (FPGA) are device that can be con-
figured to perform sequential logical functions. FPGAs are the bigger type
of logic devices in the programable device family. One of the advantages of
FPGA is that the logic architecture can be designed specifically for a certain
application, in particular, the architecture can be designed such that several
computations are made in parallel, which can create a significant speedup
in comparison with CPUs. One of the drawbacks is that an FPGA is more
complicated to program compared to a CPU.
In our case, we hope to achieve a significant speedup through the par-
allelization of the sampling process. As discussed earlier, the sampling part
33
of the algorithm can be done in parallel due to the fact that the samples are
independent.
3.2.2 ESL tool: Bluespec System Verilog
Electronic System Level design. ESL design is the creation of hardware
from an algorithmic description. Bluespec has been identified as a tool that
has a level of abstraction that is close to the algorithmic description, whilst at
the same time being capable to give the designer control over the architecture
of the hardware. Bluespec is capable of generating Verilog code that is directly
usable in synthesis tools such as Xilinx ISE.
The first problem to consider when designing a system at the Electronic
System Level is the set of operators that are available. At its most basic level,
the FPGA is capable of performing logic operations on bits. On Xilinx FPGA
this is done through the use of 6 bit Look-up tables (LUT) . Xilinx FPGAs
also feature DSP48 multipliers which are 48 ( 48 bit multipliers that return
a 48 bit result. The Bluespec language provides additional operators, mostly
for bits, or sets of bits such as integers.
The models, however, require rational numbers. Two options exist to
represent rational numbers: fixed-point or floating point representation. Two
points give fixed-point the advantage over floating-point. The first point is that
fixed-point operations usually require less hardware, therefore are faster and
require less physical resources. The second is that Bluespec already comes with
a synthesizable fixed-point library whereas it currently provides no support for
34
IEEE 754 floating point natively.
The Bluespec fixed point library [3] handles additions, subtractions
and multiplications. It also provide function for bit field width extension or
truncation and a multiplication function that returns data with the proper bit
width. Unfortunately, the division and the exponential functions required for
the algorithm are missing. They need to be implemented. On the upside, this
allows for performance tradeo#s.
3.2.3 FPGA Implementation Techniques
For the FPGA implementation, it is convenient to have a flow graph
representation of the algorithm. The algorithm is separated in two parts: the
computation of the physical model F and the measurement model H. Both
models contain stages, which are separated by doted line in the diagrams.
Operators within the same stage function in parallel. Furthermore, the design
is made to operate in a pipelined fashion, therefore each stage is a stage of
the pipeline. One final important detail that is absent in the figures for the
sake of overcrowding the illustration are the memory queues between operators
separated by multiple stages. The depth of these queues are naturally to be
equal to the number of stages separating the operators minus one.
It is essential to note that the main operations natively provided are
logic operations (AND, OR, NOT...) and 48 bit multiplications (on the Xil-
inx Virtex V FPGA). Any other operation needs to be provided by libraries
or custom made. The first technique used is the pre-computation of parame-
35
ters. Unlike C in which the compiler precomputes constant parameters (such
as constant resistance values) automatically, parameters in the FPGA mod-
ules should be precomputed to avoid using operators unnecessarily. Further-
more, a nice feature of performing the pre-computation of parameters is that
complicated operations such as divisions are avoided, saving many cycles and
hardware resources.
The second technique is the use of non native operations in the form of
modules. For the physical and measurement models of the ESR, two non native
operations are required: divisions and exponentiations. These operations can
be implemented in the form of modules. These modules may require several
cycles to complete their functions. When this is the case, the length of the
stage that contains such modules is equal to the length of the longest module
- or longest chain of modules - in terms of number of cycles.
3.2.4 Mixed control and data flow chart of the Algorithm
3.2.4.1 Physical Model (F)
The ESR’s physical model has been divided in ten stages. While eight of
these stages last only one cycle, two of them require multiple cycles due to the
use of non-native operations. These non-native operations are presented later.
The physical model computes the rates of variation of each state variable (i.e.
!, Ts,d, Xram and Me) based on the current state and the command vector.
Additionally the command noise is added to the command vector within this





























































Figure 3.1: Physical Model Flow Chart
For the FPGA implementation it is interesting to decompose the phys-
ical model of the ESR plant into three parts: inversion, dynamics and integra-
tion. The reason for this separation from the rest of the physical equations is
that it takes many more steps to compute the inversion and during this time,
some of the computations can be done in parallel so that once the inversion
is complete the rest of the computations can be completed in a fully-pipeline
fashion, with a result at each cycle. In order to keep the pipeline full, the
number of copies of parallel inverters need to be equal to the number of step it
take to complete an inversion. For the sake of comprehension, we will call the




















Figure 3.2: Full Physical Model Flow Chart
























































Figure 3.3: Physical Model Flow Chart - Inversion Excluded
3.2.4.2 Measurement Model (H)
The measurement model is rather straightforward compared to the
physical model. The only non-native operation is an exponential used for
the computation of the ESR’s resistance. The observation vector is computed


























Figure 3.4: Measurement Model Flow Chart
3.2.4.3 Whole Simulator
The whole simulator includes the physical and measurement models.










Figure 3.5: Whole System Flow Chart
3.2.5 Customized Operators
As discussed in the previous section, division and exponentiation are
not natively supported by the Virtex 5 FPGA. Therefore, they have to be
41
designed. This is a point on which performance may be gained (or lost if
poorly executed).
3.2.5.1 Division
According to [9] they are two categories of algorithms for division :
recurrence and convergence. Recurrence approaches are slow as they produce
one bit of the quotient per iteration, whereas convergence divisions produces
several bits of the final quotient at each iteration through approximation, but
require fast multipliers.
In this study, the recurrence approach has been chosen as a first ap-
proach, especially since only 1 division is present in the algorithm after factor-
izing. Were this approach too slow for other models, the convergence approach
may be further studied and implemented.
The base version of the algorithm is the paper and pencil method that
is common to most people. We are looking for Q = ND through the use of the
recurrence equation [9] :
Pk+1 = rPk # qn!k!1D for k = 1, 2, . . . ,#1 (3.1)
where:
Pk is the partial remainder after the selection of the kth quotient
digit
P0 = N (subject to the constraint |P0| < |D|)
42
r is the radix
qn!k!1 is the kth quotient to the right of the binary point
D is the divisor.
During this research it was observed that a mix between the restoring
and non restoring method could be implemented on an FPGA by making use
of parallelization. At each step, the partial remainder is shifted to the left and
the n-kth bit of the Numerator is concatenated to it. If the partial remainder is
positive, then the Numerator is e#ectively removed from the partial remainder
and the (n # k)th bit of Q is set to one. If not, the partial remainder stays
unchanged and the (n # k)th bit of Q is set to 0.
package DivFix ;
import FixedPoint : : ' ;
i n t e r f a c e Div IFC#(numeric type ai , numeric type af ,
numeric type bi , numeric type bf ,
numeric type c i , numeric type c f ) ;
method Action s t a r t (
FixedPoint#(ai , a f ) a , FixedPoint#(bi , bf ) b )
;
method FixedPoint#(c i , c f ) r e s u l t ( ) ;
method Action acknowledge ( ) ;
end in t e r f a c e
module mkDivFix ( Div IFC#(ai , af , bi , bf , c i , c f ) )
p r ov i s o s (
Arith#(FixedPoint : : FixedPoint#(c i , c f ) ) ,
B i twi se#(FixedPoint : : FixedPoint#(c i , c f ) ) ,
Add#(TAdd#(bi , bf ) , 1 , TAdd#(TAdd#(bi , bf ) , 1) ) ,
Add#(1 , a , TAdd#(ai , a f ) ) ,
43
Add#(1 , b , TAdd#(bi , bf ) ) ,
B i t s#(FixedPoint : : FixedPoint#(ai , a f ) , TAdd#(ai ,
a f ) )
) ;
I n t eg e r maxnum=valueOf ( a i )+valueOf ( a f )+valueOf (
c f ) ;
I n t e g e r maxa=valueOf ( a i )+valueOf ( a f ) ;
I n t e g e r maxb=valueOf ( b i )+valueOf ( bf ) ;
I n t e g e r maxc=valueOf ( c i )+valueOf ( c f ) ;
I n t eg e r s h i f t=valueOf ( a f )#valueOf ( bf ) ;
Reg#(Bool )
a v a i l a b l e <# mkReg(True ) ;
Reg#(Bit#(TAdd#(ai , a f ) ) )
n <# mkReg (0 ) ; //Numerator
Reg#(Bit#(TAdd#(TAdd#(bi , bf ) ,1 ) ) )
d <# mkReg (1 ) ; // D iv i so r
Reg#(Bit#(TAdd#(TAdd#(bi , bf ) ,1 ) ) )
r <# mkReg (0 ) ; // remainder
Reg#(Bit#(TAdd#(TAdd#(ai , bf ) , c f ) ) )
q <# mkReg (0 ) ;
Reg#(UInt#(TLog#(TAdd#(TAdd#(ai , a f ) , c i ) ) ) )
i <# mkReg (0 ) ; // cyc l e counter
Reg#(Bit #(1) )
s i gn s <# mkReg (? ) ;
r u l e cy c l e ( i<f romInteger (maxnum) && ( ava i l a b l e
==False ) ) ;
i f ( r>=d)
act i on
q [ f romInteger (maxnum)# i ]<=1;
i f ( i<f romInteger (maxa) ) r<={(r#d)
[ f romInteger (maxb) #2:0 ] , n [
f romInteger (maxa)#1# i ] } ;
e l s e r<=(r#d)<<1;
44
endact ion
e l s e
ac t i on
q [ f romInteger (maxnum)# i ]<=0;
i f ( i<f romInteger (maxa) ) r<={r [
f romInteger (maxb) #2:0 ] , n [
f romInteger (maxa)#1# i ] } ;
e l s e r<=r<<1;
endact ion
i <= i +1;
endru le : c y c l e
method Action s t a r t ( FixedPoint#(ai , a f ) a ,
FixedPoint#(bi , bf ) b ) i f ( a v a i l a b l e ) ;
n <= pack ( abs ( a ) ) ;
d <= signExtend ( pack ( abs (b) ) ) ;
r<=zeroExtend (n [ f romInteger (maxa) #1]) ;
i <=1;
q<=0;
s igns <=(msb( pack ( a ) ) ) ˆ(msb( pack (b) ) ) ;
a v a i l a b l e <= False ;
endmethod
method FixedPoint#(c i , c f ) r e s u l t ( ) i f ( i >=
fromInteger (maxnum) ) ;
FixedPoint#(c i , c f ) r e s ;
i f ( s h i f t <0)
r e s=(unpack (q [ f romInteger (maxc)
#1:0 ] )<<#s h i f t ) ;
e l s e
r e s=(unpack (q [ f romInteger (maxc)
#1:0 ] )>>s h i f t ) ;
i f ( s i g n s==1)
return #r e s ;
e l s e
45
re turn r e s ;
endmethod
method Action acknowledge ( ) i f ( ( i >=
fromInteger (maxnum) ) && ! a va i l a b l e ) ;




The division delay using this method is equal to the number of bits of
the numerator plus the number of bits in the fractional part of the result.
In the final version of the FPGA implementation, the division module
has actually not been used although originally motivated by requirements of
the ESR’s physical. Instead, the code was reused to develop and inversion
function, which is similar to a division, only with a numerator always equal
to one. In the end, this division module is now available for applications that
may require it.
3.2.5.2 Inversion
The inversion module was created based on the division module. The
motivation behind the creation and the use of this module is the fact that the
inversion is a division that has a numerator equal to one. As a consequence,
the number of steps required to complete an inversion is less than that of a
division given the same data types. Because the fixed part of the numerator
is just equal to one, there is no need to cycle over all the bit of the integer
46
part, hence the sparing of a number of steps equal to the number of bits in the
integer part of the fixed point data type. Therefore there are 2f steps instead
of 2f + i, since only the fractional part has to be covered.




= N ( 1
D
(3.2)
where N is the numerator and D is the denominator.
3.2.5.3 Inversion Farm
When an inversion operation is requested, an inversion module takes
2f cycles to complete the inversion. If only one inversion module is present in
the FPGA, a new inversion can only start once the previous inversion has been
completed, that is after 2f cycles. This would create a serious bottleneck in
the flow. Multiple instances of these modules can be created and grouped into
a farm. This allows for handling several inversions (or other operations) at
the same time. If there are as many modules in the farm as it takes cycles to
complete one operation, the farm can handle a new request at each cycle. This
technique was used in the FPGA implementation to achieve full pipelining.
The inversion farm is illustrated in figure 3.6.
The following is a description of the farm and its behavior:
• The farm contains 2f inverters.
• Initially, all the inverters are available.
47
• When an inversion is requested, the inversion is assigned to the next
available inverter.
• A flag associated with the inverter is set as busy.
• At the end of the inversion, the inversion result is queued and data in
the inverter is released and the inverters flag is set to available.
Since the inversion algorithm used has a constant number of steps, the
order of the results in the queue will naturally be in the same order as the
request order. There is no need for order management. If fast division was
used, order management will be required, since the number of cycles required
to complete the division may not be constant.
48
Inversion Farm











Figure 3.6: Inversion Farm Module
3.2.5.4 Exponential
A detailed architecture with performance results can be found in [4].
Two options have been identified concerning the exponential function.
The first one is to store values that the exponential function would return in

















+ . . . (3.3)
To compute the exponential with su"cient precision for the ESR esti-
mator, the order should be 15 given the range of the input [#2.6686; 2.1834].
Whereas, the factorial numbers can be set as constants, the powers need to
be computed. In order to do so in a pipelined fashion, the powers of x can
be computed through successive multiplications. The computation of all the
powers of x to the 16th power can be done in 4 stages as in figure (3.7).
To complete the exponential computation, the powers of x are multi-
plied by the inverse of the factorials at each stage. These inverse factorials
are pre-computed during compile-time and stored as constants. Multiplying

















Figure 3.7: Computing the powers of x for the Taylor series expansion of the
exponential function
3.2.6 FPGA Implementation Toolchain
In the course of designing and implementing the FPGA based design,
only two software suites are required. The first suite is Bluespec System Verilog
which includes the tools for defining and testing the functional architecture
of the design and the tools to generate hardware description language files,
namely Verilog files. The second is Xilinx ISE which can synthesis and place
and route the design for a specific FPGA target.
Bluespec include a compiler, bsc. BSC can generate simulations files
that are executable within Linux and can display the value of registers or
51
variables during runtime of the simulations, which behave in the same fashion
as the printf() function is C. The same compiler can be used to generate Verilog
files. The compiler provides the option of generating Verilog 95 compliant code,
which excludes the display functions used for the simulation.
Xilinx ISE provides the rest of the tool chain to produce netlists and
FPGA programation files. The first step consists in creating a project. The
project has to be configured with the FPGA device number and the desired/re-
quired speed grade as well as design goal. The Verilog file that contains the
top module is then imported into the project as well as optional additional
Verilog files containing sub modules and the Bluespec provided libraries used
in the design. From there, the design needs to be synthesized, placed and
routed with optional constraint files.
3.2.6.1 Software Versions
• Bluespec System Verilog v.2008.11.C
• Xilinx ISE v.10.1
52
Chapter 4
Implementation Results and Analysis
4.1 Implementation Results
In this section the speeds, in terms of particles per second, of the CPU
and the FPGA implementation are compared. For the CPU implementation,
the results are obtained by timing the computation of a given number of parti-
cles. For the FPGA implementation, the speed results are obtained by dividing
the maximum FPGA frequency given by the design tools after place/route by
the number of cycles that the computation of a given number of particles
computation would require, obtained through functional simulation.
4.1.1 CPU Results
For the given models, the approximate average speed of computation on
the given CPU-based configuration was 2.67M particles per second. Since the
CPU frequency is 2.6GHz, it can be deduced that each particle computation
requires around 1000 cycles.
4.1.2 FPGA Results
The FPGA that was used to evaluate the performance of the imple-
mentation is XC5VSX240T Virtex 5 from Xilinx. The XC5VSX240T has the
53
following specifications found in table 4.1:






Maximum Distributed RAM (Kbits) 4,200
Block RAM/FIFO w/ECC (36Kbits each) 516
Total Block RAM (Kbits) 18,576
DSP48E Slices 1,056
Under the FPGA design tool, Xilinx ISE 10.1, the FPGA has been
configured with the settings found in table 4.2.








The utilization results after synthesis and place and route are found in
table 4.3.
54
Table 4.3: FPGA Resource Utilization
FPGA Resource Utilization
Slice Logic Utilization Used Available Utilization
Number of Slice Registers 3,144 149,760 2%
Number used as Flip Flops 3,144
Number of Slice LUTs 8,223 149,760 5%
Number used as logic 7,508 149,760 5%
Number used as Memory 689 67,200 1%
Number used as Dual Port RAM 689
Number used as exclusive route-thru 26
Number of route-thrus 316 299,520 1%
Slice Logic Distribution Used Available Utilization
Number of occupied Slices 2,626 37,440 7%
Number of LUT Flip Flop pairs used 8,591
Number with an unused Flip Flop 5,447 8,591 63%
Number with an unused LUT 368 8,591 4%
Number of fully used LUT-FF pairs 2,776 8,591 32%
Number of unique control sets 176
Number of slice register sites lost to
control set restrictions
282 149,760 1%
Other Used Available Utilization
Number of DSP48Es 473 1,056 44%
55
Minimum period achieved with speed grade -2: 24.529ns (Maximum
Frequency: 40.767MHz.)
The maximum speed is achieved with a full pipeline, that is when a
particle is produced at each clock cycle. Since the maximum frequency ob-
tained under the given configuration is 40.767MHz, the achieved speed was
40.767Mparticle.s!1.
The 44% utilization of the DSP48Es (which provide for the fixed point
multiplications) is the limiting factor as is it the resource that is the most
utilized. The FPGA still has the potential to hold a second copy of the mod-
ules, which would double the simulation speed while using 88% of the DSP48E
resources. Unfortunately, ISE runs out of memory while trying to place and
route such a design, probably due to the complexity of finding a solution with
so few resources left on the FPGA.
4.2 Analysis
4.2.1 Speed and Number of Devices
Based on these results and the knowledge of the architectures it is
possible to define models to figure out the speed of particle computation and
the number of device required to achieved a certain performance level. This




For the purpose of real-time computing, that is to guaranty that all the
tasks of the system are computed before a certain deadline (which is important
when controlling a physical process), it is best to assume that the CPU will
compute the code in a worst case scenario. In that scenario, the code doesn’t
benefit from CPU optimizations such as branch prediction and is executed in
a constant worst time that is proportional to the CPU’s frequency. A CPU
may contain a set of cores, which are capable of computing one operation
at a time. One operation may take several clock cycles depending on the
architecture. The speed of computation therefore depends on the number of
CPUs, the number of cores per CPU, the number of cycle to compute one
particle and the CPU frequency with the following relationship:





• RCPU is the particle computation rate on a CPU-based computation
system;
• NCPU is the number of CPUs in the computation system;
• Ncores is the number of cores per CPU;
• Ncycles is the number of cycles per particle computation; and
57
• FCPU is the frequency of the CPUs (assuming that the CPUs each run
at the same frequency).
4.2.1.2 Number of CPUs Required for a Given Performance Level
As mentioned in the introduction, there is a relationship between the
number of particles, the required deadline to compute that number of particles
and the minimum number of devices required to achieve this performance. In
the case of CPUs this relationship is simple since the computing speed is
independent from the number of particles. From the rate equation (4.1), the








Since the FPGA implementation uses pipelining, the FPGA speed de-
pends both on the number of particles per FPGA, which determines the
pipeline’s usage, and the FPGA frequency. The speed is the time it takes
to compute a number of particles. The number of cycles it takes to compute
P particles is P + L, where P is the number of particles and L is the la-
tency, that is the number of cycles it takes for the first particle computation
to be output by the system. The speed of computation for one FPGA-based







• Rsimulator is the particles computation rate for one FPGA-based simula-
tor in particles.s!1;
• P is the number of particles;
• L is the latency in cycles; and
• FFPGA is the FPGA frequency in cycles.s!1.
Further, an FPGA-based system can be comprised of multiple FPGA
devices, each containing multiple simulators operating in parallel. This results
in the following model:





• RFPGA is the particles computation rate for one FPGA-based simulator
inparticles.s!1 ;
• NFPGA is the number of FPGAs in the FPGA based system;
• Nsimulators is the number of simulators per FPGA;
• P is the number of particles;
• L is the latency in cycles; and
• FFPGA is the FPGA frequency in cycles.s!1.
59
4.2.1.4 Number of FPGAs Required for a Given Performance Level
As in the CPU case, a minimum number of FPGAs can be determined.





































4.2.2 FPGA v. CPU performance outlook
Using the models presented in the previous sections, the performance
of CPUs and FPGAs can be forecasted and compared. Currently, we were able
to have only one simulator per FPGA and we only used one CPU core on one
CPU. In the near future, we can hope to increase the FPGA frequency and hold
multiple simulators per FPGA. On the CPU side, the use of 4 cores is currently
feasible. To get an idea of what lies ahead, we can plot the performance levels
given by the models.
4.2.2.1 Speeds
The following plots show the speed performance for the CPU and FPGA
implementations. The speed is measured in particles.s!1. For CPUs, the
60
speed remains constant for a given number of CPUs. The reason for this is that
the CPU implementation is serial and therefore its speed is independent from
the number of particles. In contrast, the FPGA computation speed depends
on the number of particles. The speed increases as there are more particles to
fill the pipeline. The maximum theoretical speed for one FPGA configuration
is equal to the FPGA’s clock frequency since the pipelining has been done
in such a way that one particle computation result is produced on each cycle
after the first result is obtained. The maximum speed is achieved when the
input to output latency (i.e. the number of clock cycles it takes for the result
to be obtained once the input has been applied) is negligible compared to the
number of particles. In this implementation, this appears to happen around
15000 particles. From the performance plot it can be seen that the FPGA
based simulations are substantially faster on FPGAs if the number of particles
is greater than 10 in all cases. Since many applications are susceptible to
require many more particles, FPGAs appear to be the solution of choice for















































































Figure 4.2: Particle Computation Rates for 4 CPU Cores and 2 FPGA based
simulator
4.2.2.2 Number of Devices
The following plots show the minimum numbers of CPUs or FPGAs
that are or would be necessary to compute a certain number of particles under
a given deadline. To better view the influence of deadline and particle the
required minimum number of devices has been plotted in 3D. In these 3D plots
it can be seen that the area where two CPUs are required is far away from
the area of operation of the ESR estimator. However, for other applications
such as the Gas Metal Arc Welding process which is a much quicker process,
63
there may be a need for computing more particles under a shorter deadline.
If this is the case, then it appears again that FPGAs will be the device of
choice. The graphs show that many more CPUs will be needed as the number
of particles increases and the deadline to compute this number of particles












Minimum Number of Devices Required
10000Number of Particles 0.00.10.20.30.40.50.60.70.8 Deadline in ms20000 0.91.0
4 CPU
1 FPGA




















Figure 4.4: Number of Devices Required; 4 Cores per CPU v. 2 Simulator per
FPGA
4.2.2.3 Performance Analysis
The major speed advantage that the FPGA implementation features is
the use of pipelining. Pipelining the particle computation allows for intense
parallelization, which leads to a substantial speed increase. In terms of speed
and number of devices required, FPGAs provide the best performance at this
point in time as well as in the years to come. Whereas the CPU implementation
can only be improve through the use of multiple cores (the current working
65
implementation uses only one single core), the FPGA implementation can still
receive more enhancements. In the current implementation, many additions
and multiplications are grouped together. Since the FPGA still has available
resources, these steps of the pipeline can be cut into smaller substeps in which
only one operation is performed. This will lead to two improvements: a shorter
maximum delay (faster clock) and more parallelization. Currently, the FPGA
implementation is limited by the number of DSP48 blocks used, which is 44%.




Conclusion and Future Work
5.1 Analysis of Results
The project started with the objective to compute at least 100 particles
within 133ms for the ESR process given the Ahn’s models. As it turns out,
both technologies are capable of achieving this performance level with a large
margin. This opens the door to using particle filtering estimation online for
the Electro-Slag Remelting process with even further precision than before.
During the course of this performance study, we were able to determine
macro level performance models for CPUs and FPGAs that give an idea of the
capabilities of these two devices. These macro level performance models are
a real asset for the solution space exploration of the implementation problem.
For the process at hand, these models show that FPGA devices are capable
of achieving higher performance than CPUs. Furthermore, these models also
show that fewer FPGA devices would be required to achieve the same perfor-
mance level as CPU for this application. As the rate of required particles per
second increases, the number of CPUs required increases significantly faster
in contrast to the the number of FPGA to complete the same tasks. FPGAs
provide further performance that scales with complexity. In terms of costs,
67
implementation complexity and predictability, FPGAs provide the best solu-
tion for systems with large state variables that require many particles to be
computed under a short period of time.
5.2 Conclusion
The experimental results show that an FPGA is capable of out-performing
a CPU in terms of speed for the specific application of estimating the state
of a particular process using Particle Filtering. The FPGA implementation
exploits the fact that the estimation technique uses processes that can be exe-
cuted in parallel, in this case the process of drawing of particles. The technique
further exploits the fact that the particles are run through physical models that
are also parallel due to the fact that processes occur in parallel in the physical
world.
On the down side, FPGAs are devices that are more complex to pro-
gram and may fail to provide fast performance in cases where the computing
requires further non trivial mathematical operations such as square roots or
logarithms. These operations are further burdened by the requirements of data
representation. For instance, if the application requires double precision float-
ing point data representation, the number of simultaneous operations that can
be done will be significantly lower than the number of operations that would
been done with only fixed point data representation. This is due to the fact
that floating point operations require more resources than their fixed point
equivalents.
68
The FPGA is exploiting a more e"cient structure and parallelism to
get the same amount of work done than its CPU counterpart. Specifically the
FPGA runs at a much lower frequency than the CPU for the same amount of
work. From a broad perspective, lower frequency usually implies lower power
consumption. Although the manufacturing process requires substantially more
power than its controller (the controller’s power consumption is not an issue
for the application at hand), for other applications power may be an issue such
as in mobile applications. One example is mobile robotics, where for instance
particle filtering is used to determine a robot’s position based on physical
models and noisy observations. Other applications may include transportation
systems, portable medical devices or other systems in which there is also a
critical interaction with the physical world.
Lastly, these results show that more complex models can now be de-
veloped to even further reduce the root mean squared error of the estimate,
potentially yielding even better control signals, improving the quality of the
resulting material produced.
5.3 Future Work
The future of this work is to develop a tool chain to provide assistance
in the choice of a target device. This tool chain would use implementation
performance results and the sort of performance models that were put together
in this research to evaluate the speed and precision o#ered by each potential
target device.
69
One additional future task would consist in evaluating the other tools
ESL that are commercially available such as: The Mathworks’ Filter Design
HDL Coder, AccelDSP from Xilinx, National Instrument’s Labview FPGA,
DK Design Suite from Agility DS, Cynthesizer from Forte.
Finally other computing devices such as GPUs and DSPs should be
studied for suitability. The GPU suitability research is in the process of eval-







This appendix is the complete set of code that runs the physical and
measurement models. The code is decomposed in several Bluespec .bsv files
that contain modules. The code is given in order of dependency, that is that
later files require earlier files. The order is therefore: datatypes, parameters,
inversion and exponentiation, F and H separetely, F and FH together, multiple
F and H in parallel (the latter is provided eventhough the FPGA wasn’t able




' F i l e : Types . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
This f i l e conta in s the datatypes that are used
f o r the s imu la t i ons .
Al l datatypes are o f the f i x edpo in t type from
the Bluespec Library .
72
Fina l ly , i t was dec ided that a l l the data w i l l
be o f type
FixedPoint #(20 ,28) . This i s because o f the range
o f some data and the
s i z e o f the DSP48 mu l t i p l i e r s (48 b i t s )
I f r equ i red , d i f f e r e n t s i z e could be de f ined f o r
each va r i a b l e s . A l o t
o f the s imu la t i on models have been des igned to
accept d i f f e r e n t s i z e s .
Due to the p o s s i b i l i t y o f d i f f e r e n t data types
f o r the va r i a b l e s within
a same category ( state , command , observat ion ,
no i s e . . . ) the data i s
grouped in s t r u c t u r e s .
'/
package Types ;
import FixedPoint : : ' ;
typede f 20 I s i z e ;
typede f 28 Fs i z e ;
typede f 48 DTsize ;
typede f FixedPoint#( I s i z e , F s i z e ) Datatype ; //
standard datatype f o r t h i s p r o j e c t
// State Vector data types
typede f Datatype X1type ;
typede f Datatype X2type ;
typede f Datatype X3type ;
typede f Datatype X4type ;
typede f Datatype X5type ;
73
//U data types
typede f Datatype U1type ;
typede f Datatype U2type ;
//Measurment no i s e data types
typede f Datatype MN1type ;
typede f Datatype MN2type ;
typede f Datatype MN3type ;
typede f Datatype MN4type ;
typede f Datatype MN5type ;
// Process no i s e data types
typede f Datatype PN1type ;
typede f Datatype PN2type ;
//Observation vec to r datatypes
typede f Datatype Y1type ;
typede f Datatype Y2type ;
typede f Datatype Y3type ;
typede f Datatype Y4type ;
typede f Datatype Y5type ;
typede f s t r u c t {
X1type de l t a ;




} Xtype de r i v i ng ( Bi t s ) ;
typede f s t r u c t {
Y1type d ;
Y2type xram ;
Y3type i r ;
74
Y4type l c ;
Y5type vo l t ;
} Ytype de r i v i ng ( Bi t s ) ;
typede f s t r u c t {
U1type i c ;
U2type vramc ;
} Utype de r i v i ng ( Bi t s ) ;
typede f s t r u c t {
MN1type d ;
MN2type xram ;
MN3type i r ;
MN4type l c ;
MN5type vo l t ;
} MeasNoisetype de r i v i ng ( Bit s ) ;
typede f s t r u c t {
PN1type i c ;
PN2type vramc ;




' F i l e : Params . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
Contains the precomputed parameters f o r the
phys i c a l and measurment
models .
75
For the d e t a i l s about the va lue s o f the se
parameters , consu l t Ahn ’ s
d i s s e r t a t i o n .
'/
package Params ;
import FixedPoint : : ' ;
import Types : : ' ;
typede f Datatype Param ;
//Param p R1 = 6.0000000000e#03;
//Param p R1e = 6.5000000000e#03;
//Param p k1 = 1.3000000000e+03;
Param p De = 2.0320000000e+01;
Param p re = 1.0160000000e+01;
Param p Di = 2.5400000000e+01;
Param p r i = 1.2700000000e+01;
Param p Aelect = 1.2130000000e#03;
Param p m0 = 1.4285714286e#02;
Param p m1 = 7.6923076923e#04;
Param p Tsstar = 2.2000000000e+03;
Param p Tr = 3.0000000000e+02;
Param p rhor = 7.8300000000e+00;
Param p Cr = 4.3400000000e#01;
Param p Kroom = 6.3900000000e#01;
Param p a lphar = 1.8803962074e#01;
Param p Tm = 1.7830000000e+03;
Param p rhom = 7.4000000000e+00;
Param p Cm = 1.1680000000e+00;
Param p Km = 3.1300000000e#01;
Param p alpham = 3.6213439467e#02;
Param p hm = 8.9287129300e+03;
Param p Tsup = 1.8830000000e+03;
Param p L = 2.7196000000e+02;
Param p hsup = 1.1543287930e+04;
Param p Cs0 = 1.4700000000e+00;
Param p rhos = 2.5500000000e+00;
Param p Ks = 4.1800000000e#02;
76
Param p Tss = 1.7730000000e+03;
Param p Ms = 5.6750000000e+04;
Param p Vs = 2.2254901961e+04;
Param p Ae = 3.2429278662e+02;
Param p Ai = 5.0670747910e+02;
Param p betam = #8.0741590882e#01;
Param p lambda = 3.4149767859e+00;
Param p a0 = 3.6000000000e#01;
Param p den = 2.1244930358e+01;
Param p Cdd = 1.0746632253e+01;
Param p Cdp = 5.1437804365e+00;
Param p Csd0 = 2.0781252910e+00;
Param p Csp = 1.7681745250e+00;
Param p mu = 5.5000000000e#01;
Param p mdot0 = 5.0000000000e+01;
Param p Pm0 = 7.7995188716e+04;
Param p P0 = 1.4180943403e+05;
Param p pm0 = 2.4050855256e+02;
Param p Ts0 = 2.2000000000e+03;
Param p d0 = 0.0000000000e+00;
Param p hs0 = 4.3920610764e+01;
Param p He = 5.7675911885e#01;
Param p Hs = 4.2642023223e#02;
Param p R0 = 5.0000000000e#03;
Param p I0 = 5.3255879305e+03;
Param p V0 = 2.6627939653e+01;
Param p de l t a0 = 1.4584705609e+01;
Param p Sdot0 = 2.0835359390e#02;
Param p Vram0 = 7.5007293803e#03;
Param p mur0 = 0.0000000000e+00;
Param p sigmadV = 5.0000000000e#02;
Param p sigmaPos = 3.0000000000e#01;
Param p sigmaImeas = 2.0000000000e+02;
Param p sigmaLC = 5.0000000000e+02;
Param p sigmaV = 1.0000000000e#01;
Param p sigmaI = 1.8420000000e+02;
Param p sigmaVram = 1.0000000000e#02;
77
Param p sigmamur = 5.0000000000e#04;
Param p sigmaa = 3.6000000000e#03;
Param p sigmaIb = 2.6627939653e+00;
Param p sigmaVramb = 7.5007293803e#04;
Param p sigmaVoltb = 1.3313969826e#02;
Param p k0 = 7.0000000000e+01;
Param p k1 = 1.3000000000e+03;
Param p R1 = 6.0000000000e#03;
Param p R1e = 6.5000000000e#03;
Param p d In f l e c t i o n = 0.0000000000e+00;
Param p SlagTemperatureTimeConstant = 1.0000000000e+01;
Param p KTs = 1.0000000000e#01;
Param p DepthControlTimeConstant = 1.0000000000e+00;
Param p Kd = 1.0000000000e+00;




' F i l e : i nve r s e . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
' Computes the i nve r s e value o f a f i x ed po int
value in a p ip e l i n ed f a sh i on .
'
' The algor i thm used i s the slow d i v i s i o n .
' The number o f s t eps i t takes i s 2' f s i z e , where
f s i z e i s the s i z e
' o f the f r a c t i o n a l part .
'
' Note : S ince t h i s a lgor i thm was adapted from the
d i v i s i o n algor i thm
78
' most o f the va r i ab l e names r e f e r to names that
would be used f o r d i v i s i o n s
'/
package i nve r s e ;
import FixedPoint : : ' ;
/''
abs t ra c t i n t e r f a c e Div IFC
parameters :
i : number o f b i t s f o r the i n t e r g e r part
f : number o f b i t s f o r the f r a c t i o n a l
part
s t a r t : f e ed value to be inve r t ed in to the
p i p e l i n e
r e s u l t : r e tu rns the inve r t ed value
acknowledge : removes the value from the p i p e l i n e
.
'/
i n t e r f a c e Div IFC#(numeric type i , numeric type f ) ;
method Action s t a r t ( FixedPoint#(i , f )
den ) ;
method Maybe#(FixedPoint#(i , f ) ) r e s u l t ( ) ;
method Action acknowledge ( ) ;
end in t e r f a c e
module mkInverse ( Div IFC#(i , f ) )
p r ov i s o s (
Bi t s#(FixedPoint : : FixedPoint#(i , f ) ,
TAdd#(i , f ) ) ,
Add#(TAdd#(i , f ) , f , TAdd#(TAdd#(i , f ) ,
f ) ) ,
Add#(1 , a , TAdd#(i , f ) )
) ;
I n t e g e r i s i z e=valueOf ( i ) ;
79
I n t e g e r f s i z e=valueOf ( f ) ;
I n t e g e r f p s i z e=i s i z e+f s i z e ;
Reg#(Bit#(TAdd#(i , f ) ) ) d
<# mkReg (0 ) ;
Reg#(Bit#(TAdd#(i , f ) ) ) r
<# mkReg (0 ) ;
Reg#(Bit#(TAdd#(i , f ) ) ) q
<# mkReg (0 ) ;
Reg#(Bit#(TAdd#(TLog#(TAdd#(f , f ) ) ,1 ) ) ) count
<# mkReg (0 ) ;
Reg#(Bool ) a va i l a b l e
<# mkReg(True ) ;
Reg#(Bit #(1) ) s i gn
<# mkReg (? ) ;
Reg#(Bool ) outOfRange
<# mkReg( Fa l se ) ;
r u l e cyc l e ( ( ! a v a i l a b l e ) && ( count<=fromInteger
(2' f s i z e ) ) ) ;
i f ( r<d)
ac t i on




e l s e
ac t i on




i f ( q [ f p s i z e #2]==1)outOfRange<=True ; //#2
because msb i s 1 a f t e r s h i f t need to




$d i sp l ay (” cyc l e=============\n”) ;
$d i sp l ay (” cyc l e : d = %b : ” , d ) ;
$d i sp l ay (” cyc l e : r = %b : ” , r ) ;
$d i sp l ay (” cyc l e : q = %b : ” , q ) ;
'/
endru le
method Action s t a r t ( FixedPoint#(i , f ) den ) i f
( a v a i l a b l e ) ;
FixedPoint#(i , f ) one=1;
d <= {pack ( abs ( den ) ) , ’ 0} ;
r <=1;
q <=0;
count <= 0 ;
a va i l a b l e <=False ;
s ign<=msb( pack ( den ) ) ;
outOfRange <= False ;
/'
$d i sp l ay (” count = %d \n” , count ) ;
$d i sp l ay (” i s i z e = %d \n” , i s i z e ) ;
$d i sp l ay (” f s i z e = %d \n” , f s i z e ) ;
$d i sp l ay (” f p s i z e = %d \n” , f p s i z e ) ;
'/
$d i sp l ay (”INV START = %d \n” , count ) ;
endmethod
method Maybe#(FixedPoint#(i , f ) ) r e s u l t ( ) i f (
count>f romInteger (2' f s i z e ) && ! a v a i l a b l e ) ;
i f ( outOfRange==False )
i f ( s i gn==1)
return Valid (#unpack (q ) )
;
e l s e
81
return Valid ( unpack (q ) )
;
e l s e
re turn Inva l i d ;
endmethod
method Action acknowledge ( ) i f ( count>=
fromInteger (2' f s i z e ) && ! a v a i l a b l e ) ;
a v a i l a b l e <=True ;
endmethod
endmodule : mkInverse
endpackage : i nve r s e
1.2.4 invFarm.bsv
/''
' F i l e : invFarm . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
' The inve r s e farm i s a s e t o f i n v e r t e r s that can
accomodate a new
inve r s i on r eques t a each cyc le , whi l e cont inu ing
the i n v e r s i o n s that
are a l ready under way .
To do th i s , the number o f i n v e r t e r s in the s e t
must be equal to the
number o f c y c l e s i t takes to compute an
inve r s i on (2' f ) .
At each new request , an i nve r s i on i s a s s i gned to
an ava i l a b l e i n v e r t e r
and t h i s i n v e r t e r i s f l a g g ed as busy un t i l the
r e s u l t has been removed
82
from i t .
S ince each i nve r s i on r e qu i r e s the same number o f
cyc l e s , the r e s u l t s
are provided in the same order as the r eque s t s .
'/
package invFarm ;
import FixedPoint : : ' ;
import i nve r s e : : ' ;
import FIFO : : ' ;
/''
i n t e r f a c e InvFarm IFC
Parameters :
i : number o f b i t s o f the i n t e g e r part
f : number o f b i t s o f the f r a c t i o n a l part
nbinv : number o f i n v e r t e r s ( suggested 2 f )
Methods :
submit : submit a value to be inve r t ed
r e s u l t : r e turn the next a va i l a b l e r e s u l t in the
other the i t was requested
acknowledge : remove the cur r ent r e s u l t from the
i n v e r t e r and s e t s the
next i n v e r t e r from which the next r e s u l t
should be read from .
'/
i n t e r f a c e InvFarm IFC#(numeric type i , numeric type f ,
numeric type nbinv ) ;
method Action submit (
FixedPoint#(i , f ) i nva l ) ;
method Maybe#(FixedPoint#(i , f ) ) r e s u l t ( ) ;
method Action acknowledge ( ) ;
83
end in t e r f a c e
module mkInvFarm( InvFarm IFC#(i , f , nbinv ) )
p r ov i s o s (
Bi t s#(FixedPoint : : FixedPoint#(i , f ) , TAdd#(i , f )
) ,
Add#(TAdd#(i , f ) , f , TAdd#(TAdd#(i , f ) , f ) ) ,
Add#(1 , a , TAdd#(i , f ) )
) ;
I n t e g e r i s i z e=valueOf ( i ) ;
I n t e g e r f s i z e=valueOf ( f ) ;
I n t e g e r f p s i z e=i s i z e+f s i z e ;
I n t eg e r n=valueOf ( nbinv ) ;
//Reg#(Bool ) busy [ n ] ;
Reg#(UInt#(TAdd#(TLog#(nbinv ) ,1 ) ) )
next Inve r t e r <#mkReg (0 ) ; // Index o f the
next a v a i l a b l e i n v e r t e r
Reg#(UInt#(TAdd#(TLog#(nbinv ) ,1 ) ) )
nextResult <#mkReg (0 ) ; // Index o f the
next r e s u l t
FIFO#(FixedPoint#(i , f ) ) in <# mkLFIFO ;
FIFO#(Maybe#(FixedPoint#(i , f ) ) ) out <# mkLFIFO ;
// c r ea t e n i n v e r t e r s
Div IFC#(i , f ) i n v e r t e r [ n ] ;
f o r ( I n t e g e r i =0; i<n ; i=i +1)
i n v e r t e r [ i ] <# mkInverse ;
f o r ( I n t e g e r i =0; i<n ; i=i +1)
begin
Reg#(Bool ) busy i<# mkReg( Fa l se ) ;
r u l e s t a r t i ( ( ! busy i ) && ( next Inve r t e r
==fromInteger ( i ) ) ) ; // s t a r t
i nv e r s i on on the next a va i l a b l e
84
i n v e r t e r .
i n v e r t e r [ i ] . s t a r t ( in . f i r s t ( ) ) ;
in . deq ( ) ;
busy i<=True ;
i f ( i>=fromInteger (n#1) )
next Inve r t e r <= 0 ;
e l s e
next Invert e r <=
fromInteger ( i )+1;
// $d i sp l ay (”START %d” , i ) ;
endru le
ru l e end i ( busy i && ( nextResult==
fromInteger ( i ) ) ) ; //Once the
i n v e r t e r f o r which a r e s u l t i s
expected i s done , get the r e s u l t and
f r e e the i n v e r t e r .
out . enq ( i n v e r t e r [ i ] . r e s u l t ( ) ) ;
i n v e r t e r [ i ] . acknowledge ;
// $d i sp l ay (”END %d” , i ) ;
busy i<=False ;
i f ( i>=fromInteger (n#1) )
nextResult <= 0 ;






r u l e d i s p l a y va l u e s ;
// $d i sp l ay (” next Inver t e r= %d” ,
next Inve r t e r ) ;
// $d i sp l ay (” nextResult= %d” , nextResult ) ;
// $d i sp l ay (” busy [ next Inve r t e r ]= %d” , busy




method Action submit (
FixedPoint#(i , f ) i nva l ) ;
// $d i sp l ay (”SUBMIT”) ;
in . enq ( i nva l ) ;
endmethod
method Maybe#(FixedPoint#(i , f ) ) r e s u l t ( ) ;
r e turn out . f i r s t ( ) ;
endmethod
method Action acknowledge ( ) ;






' F i l e : ExponFix . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
' Computes the exponent i a l f o r the ESR by us ing a
Taylor
' s e r i e s expanssion up to the 15 th order .
' The implementation i s a 4 s tage p i p e l i n e .
' The i n t e r f a c e i s abs t ra c t . S ince the module i s
intended
' f o r f i x ed po int values , the i n t e g e r and
f r a c t i o n a l s i z e s
86
' must be provided .
'
' Stage0 : compute xˆ2 ,
' Stage1 : compute xˆ3 and xˆ4
' Stage2 : compute xˆ5 , xˆ6 , xˆ7 and xˆ8
' Stage3 : compute xˆ9 , xˆ10 , xˆ11 , xˆ12 , x ˆ13 , x
ˆ14 , xˆ15 , xˆ16
'/
package ExponFix ;
import FIFO : : ' ;
import FixedPoint : : ' ;
f unc t i on In t e g e r f a c t o r i a l ( I n t e g e r n) = (n<=1 ? 1 : n '
f a c t o r i a l (n#1) ) ;
/''
I n t e r f a c e Exp IFC
Desc r ip t i on :
va lue s :
a i : s i z e o f the i n t e g e r part
a f : s i z e o f the f a c t i o n a l part
a : data to be exponent iated
methods :
f e ed : f e ed s value in to the
p i p e l i n e
f e t ch : r e tu rns exponent i a l (
w i l l not be c a l l e d i f
the r e i s no r e s u l t in
the p i p e l i n e )
removeresu l t : removes the data in the
l a s t s tage
'/
i n t e r f a c e Exp IFC#(numeric type ai , numeric type a f ) ;
method Action f eed ( FixedPoint#(ai , a f ) a ) ;
87
method FixedPoint#(ai , a f ) f e t ch ( ) ;
method Action removeresu l t ( ) ;
e nd in t e r f a c e
module mkExponFix (Exp IFC#(ai , a f ) )
p r ov i s o s (
// Bitwise#(FixedPoint : : FixedPoint#(ai , a f ) ) ,
Rea lL i t e r a l#(FixedPoint : : FixedPoint#(ai , a f ) ) ,
Arith#(FixedPoint : : FixedPoint#(ai , a f ) ) ,
Add#(ai , af , TAdd#(ai , a f ) ) ,
Add#(1 , a , a i )
) ;
FIFO#(FixedPoint#(ai , a f ) ) a1 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a1p <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a2 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a2p <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a3 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a3p <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a4 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a5 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a6 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a7 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a8 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a9 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a10 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a11 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a12 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a13 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a14 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a15 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) a16 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) expval0 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) expval1 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) expval2 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) expval3 <# mkLFIFO( ) ;
FIFO#(FixedPoint#(ai , a f ) ) expval4 <# mkLFIFO( ) ;
88
r u l e stage0 ;
a2 . enq ( a1 . f i r s t ( ) 'a1 . f i r s t ( ) ) ;
a1p . enq ( a1 . f i r s t ( ) ) ;
expval1 . enq ( expval0 . f i r s t ( )+a1 . f i r s t ( ) ) ;
a1 . deq ( ) ;
expval0 . deq ( ) ;
$wr i t e ( ” s tage 0 expval0 i s ” ) ;
fxptWrite ( 10 , expval0 . f i r s t ( ) ) ;
endru le
r u l e stage1 ;
a2p . enq ( a2 . f i r s t ( ) ) ;
a3 . enq ( a2 . f i r s t ( ) 'a1p . f i r s t ( ) ) ;
a4 . enq ( a2 . f i r s t ( ) 'a2 . f i r s t ( ) ) ;
expval2 . enq ( expval1 . f i r s t ( )+a2 . f i r s t ( ) '
f romRational (1 , f a c t o r i a l ( 2 ) ) ) ;
a1p . deq ( ) ;
a2 . deq ( ) ;
expval1 . deq ( ) ;
$wr i t e ( ” s tage 1 expval1 i s ” ) ;
fxptWrite ( 10 , expval1 . f i r s t ( ) ) ;
endru le
r u l e stage2 ;
a3p . enq ( a3 . f i r s t ( ) ) ;
a5 . enq ( a2p . f i r s t ( ) 'a3 . f i r s t ( ) ) ;
a6 . enq ( a3 . f i r s t ( ) 'a3 . f i r s t ( ) ) ;
a7 . enq ( a3 . f i r s t ( ) 'a4 . f i r s t ( ) ) ;
a8 . enq ( a4 . f i r s t ( ) 'a4 . f i r s t ( ) ) ;
expval3 . enq ( expval2 . f i r s t ( )+a3 . f i r s t ( ) '
f romRational (1 , f a c t o r i a l ( 3 ) )+a4 . f i r s t
( ) ' f romRational (1 , f a c t o r i a l ( 4 ) ) ) ;
a2p . deq ( ) ;
a3 . deq ( ) ;
a4 . deq ( ) ;
expval2 . deq ( ) ;
89
$wr i t e ( ” s tage 2 expval0 i s ” ) ;
fxptWrite ( 10 , expval2 . f i r s t ( ) ) ;
endru le
r u l e stage3 ;
a9 . enq ( a6 . f i r s t ( ) 'a3p . f i r s t ( ) ) ;
a10 . enq ( a5 . f i r s t ( ) 'a5 . f i r s t ( ) ) ;
a11 . enq ( a5 . f i r s t ( ) 'a6 . f i r s t ( ) ) ;
a12 . enq ( a6 . f i r s t ( ) 'a6 . f i r s t ( ) ) ;
a13 . enq ( a6 . f i r s t ( ) 'a7 . f i r s t ( ) ) ;
a14 . enq ( a7 . f i r s t ( ) 'a7 . f i r s t ( ) ) ;
a15 . enq ( a8 . f i r s t ( ) 'a7 . f i r s t ( ) ) ;
a16 . enq ( a8 . f i r s t ( ) 'a8 . f i r s t ( ) ) ;
expval4 . enq ( expval3 . f i r s t ( )+a5 . f i r s t ( ) '
f romRational (1 , f a c t o r i a l ( 5 ) )+a6 . f i r s t
( ) ' f romRational (1 , f a c t o r i a l ( 6 ) )+a7 .
f i r s t ( ) ' f romRational (1 , f a c t o r i a l ( 7 ) )
+ a8 . f i r s t ( ) ' f romRational (1 , f a c t o r i a l
( 8 ) ) ) ;
a3p . deq ( ) ;
a5 . deq ( ) ;
a6 . deq ( ) ;
a7 . deq ( ) ;
a8 . deq ( ) ;
expval3 . deq ( ) ;
$wr i t e ( ” s tage 3 expval3 i s ” ) ;
fxptWrite ( 10 , expval3 . f i r s t ( ) ) ;
endru le
method Action f eed ( FixedPoint#(ai , a f ) a ) ;
a1 . enq ( a ) ;
expval0 . enq (1 ) ;
endmethod
method FixedPoint#(ai , a f ) f e t ch ( ) ;
r e turn expval4 . f i r s t ( )+a9 . f i r s t ( ) '
f romRational (1 , f a c t o r i a l ( 9 ) )+a10 .
90
f i r s t ( ) ' f romRational (1 , f a c t o r i a l (10) )
+a11 . f i r s t ( ) ' f romRational (1 , f a c t o r i a l
(11) )+a12 . f i r s t ( ) ' f romRational (1 ,
f a c t o r i a l (12) )+a13 . f i r s t ( ) '
f romRational (1 , f a c t o r i a l (13) )+a14 .
f i r s t ( ) ' f romRational (1 , f a c t o r i a l (14) )
+a15 . f i r s t ( ) ' f romRational (1 , f a c t o r i a l
(15) )+a16 . f i r s t ( ) ' f romRational (1 ,
f a c t o r i a l (16) ) ;
endmethod
method Action removeresu l t ( ) ;
a9 . deq ( ) ;
a10 . deq ( ) ;
a11 . deq ( ) ;
a12 . deq ( ) ;
a13 . deq ( ) ;
a14 . deq ( ) ;
a15 . deq ( ) ;
a16 . deq ( ) ;






' F i l e : HBRAM. bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
91
' Computes the ESR’ s measurment model in a




import Types : : ' ;
import FixedPoint : : ' ;
import ExponFix : : ' ;
import FIFO : : ' ;
import BRAMFIFO : : ' ;
import Params : : ' ;
// in i tParamete r s ( ) ;
/''
i n t e r f a c e H IFC
Desc r ip t i on :
i n t e r f a c e f o r the measurement module
measure : f e ed the state , the command vec to r and
no i s e in to the p i p e l i n e
fetchMeasurement : r eturn the value o f the
s imulated measurement
removeMeasurement : remove the measurement value
from the p i p e l i n e
'/
i n t e r f a c e H IFC ;
method Action measure (Xtype x ,Utype u ,
MeasNoisetype n) ;
method Ytype fetchMeasurement ( ) ;
method Action removeMeasurement ( ) ;
end in t e r f a c e
module mkH(H IFC) ;
92
//NOTE: The minimum s i z e f o r the these f i r s t
FIFOS has not been determined
//They were s e t to 20 f o r now .
FIFO#(Xtype ) x <# mkSizedBRAMFIFO(20) ;
FIFO#(Utype ) u <# mkSizedBRAMFIFO(20) ;
FIFO#(MeasNoisetype ) n <# mkSizedBRAMFIFO(20) ;
FIFO#(Ytype ) y <# mkSizedBRAMFIFO(20) ;
FIFO#(Datatype ) r <# mkSizedBRAMFIFO(20) ;
FIFO#(Datatype ) ubvolt <# mkSizedBRAMFIFO(20) ;
// unbiased vo l t age in s tage 7
FIFO#(Datatype ) vo l t <# mkSizedBRAMFIFO(20) ;
// stage1
FIFO#(Datatype ) d4no i se <#
mkSizedBRAMFIFO(11) ;
FIFO#(Datatype ) xram4noise <#
mkSizedBRAMFIFO(11) ;
FIFO#(Datatype ) dStage3 <#
mkSizedBRAMFIFO(3) ;
FIFO#(Datatype ) meStage2 <# mkLFIFO( ) ;
FIFO#(Datatype ) meStage3 <#
mkSizedBRAMFIFO(3) ;
FIFO#(Datatype ) i r <#
mkSizedBRAMFIFO(9) ;
FIFO#(Datatype ) i r 4 n o i s e <#
mkSizedBRAMFIFO(11) ;
FIFO#(Datatype ) m0Xd <# mkLFIFO( ) ;
FIFO#(Datatype ) m1Xd <#
mkLFIFO( ) ;
FIFO#(Datatype ) aeXrhosXd <# mkLFIFO( ) ;




FIFO#(Datatype ) negDepthR <#mkLFIFO( ) ;
FIFO#(Datatype ) posDepthR <#mkLFIFO( ) ;
FIFO#(Datatype ) exponent <#mkLFIFO( ) ;
FIFO#(Datatype ) posDepthMass <#mkLFIFO( ) ;
// stage3
FIFO#(Datatype ) rd <#mkSizedBRAMFIFO(6) ;
FIFO#(Datatype ) l c <#
mkSizedBRAMFIFO(9) ;
Exp IFC#( I s i z e , F s i ze ) exponent ia tor <#
mkExponFix ( ) ;
Datatype ib =0;
Datatype vo l tb =0;
r u l e stage1 ;
//Utype uu ;
//uu=u . f i r s t ( ) ;
m0Xd. enq (x . f i r s t ( ) . d'p m0) ;
m1Xd. enq (x . f i r s t ( ) . d'p m1) ;
aeXrhosXd . enq (x . f i r s t ( ) . d'p Ae'p rhos ) ;
xram4noise . enq (x . f i r s t ( ) . xram) ;
t sMtsstar . enq (x . f i r s t ( ) . ts#p Tsstar ) ;
i r . enq ( ib+u . f i r s t ( ) . i c ) ;
i r 4 n o i s e . enq ( ib+u . f i r s t ( ) . i c ) ;
d4no i se . enq (x . f i r s t ( ) . d ) ;
meStage2 . enq (x . f i r s t ( ) .me) ;
meStage3 . enq (x . f i r s t ( ) .me) ;
dStage3 . enq (x . f i r s t ( ) . d ) ;
x . deq ( ) ;
u . deq ( ) ;
$d i sp l ay (”H: Stage1\n”) ;
endru le
94
r u l e stage2 ;
negDepthR . enq (p R1#m0Xd. f i r s t ( ) ) ;
posDepthR . enq (p R1#m1Xd. f i r s t ( ) ) ;
exponent . enq ( t sMtsstar . f i r s t ( )'#p Aelect
) ;
posDepthMass . enq ( meStage2 . f i r s t ( )#
aeXrhosXd . f i r s t ( ) ) ;
m0Xd. deq ( ) ;
m1Xd. deq ( ) ;
t sMtsstar . deq ( ) ;
meStage2 . deq ( ) ;
aeXrhosXd . deq ( ) ;
$d i sp l ay (”H: Stage2\n”) ;
endru le
r u l e stage3 ;
i f ( dStage3 . f i r s t ( ) >0)
ac t i on
rd . enq ( posDepthR . f i r s t ( ) ) ;
l c . enq ( posDepthMass . f i r s t ( ) ) ;
endact ion
e l s e
ac t i on
rd . enq ( negDepthR . f i r s t ( ) ) ;
l c . enq ( meStage3 . f i r s t ( ) ) ;
endact ion
exponent ia tor . f e ed ( unpack ( pack ( exponent .
f i r s t ( ) ) ) ) ;
dStage3 . deq ( ) ;
posDepthR . deq ( ) ;
posDepthMass . deq ( ) ;
negDepthR . deq ( ) ;
meStage3 . deq ( ) ;
exponent . deq ( ) ;
95
$d i sp l ay (”H: Stage3\n”) ;
endru le
r u l e computeR ;
r . enq ( rd . f i r s t ( ) ' exponent iator . f e t ch ( ) ) ;
rd . deq ( ) ;
exponent iator . removeresu l t ( ) ;
$d i sp l ay (”H: computeR\n”) ;
endru le
r u l e computeUbVolt ;
ubvolt . enq ( r . f i r s t ( ) ' i r . f i r s t ( ) ) ;
r . deq ( ) ;
i r . deq ( ) ;
$d i sp l ay (”H: computeUbVolt\n”) ;
endru le
r u l e computeVolt ;
vo l t . enq ( ubvolt . f i r s t ( )+vo l tb ) ;
ubvolt . deq ( ) ;
$d i sp l ay (”H: computeVolt\n”) ;
endru le
r u l e addnoise ;
Ytype measurement ;
MeasNoisetype no i s e ;
no i s e=n . f i r s t ( ) ;
measurement . d=d4no i se . f i r s t ( )+no i s e . d ;
measurement . xram=xram4noise . f i r s t ( )+
no i s e . xram ;
measurement . i r=i r 4 n o i s e . f i r s t ( )+no i s e . i r
;
96
measurement . l c=l c . f i r s t ( )+no i s e . l c ;
measurement . vo l t=vo l t . f i r s t ( )+no i s e . vo l t
;
y . enq ( measurement ) ;
d4no i se . deq ( ) ;
xram4noise . deq ( ) ;
i r 4 n o i s e . deq ( ) ;
l c . deq ( ) ;
vo l t . deq ( ) ;
n . deq ( ) ;
$d i sp l ay (”H: addnoise \n”) ;
endru le
/'
r u l e d i s p l a y s t u f f ;
$wr i te ( ” r e s u l t i s ” ) ; fxptWrite ( 10 , i r . f i r s t
( ) ) ; $d i sp l ay (””) ;
endru le
'/
method Action measure (Xtype xin ,Utype uin ,
MeasNoisetype nin ) ;
x . enq ( xin ) ;
u . enq ( uin ) ;
n . enq ( nin ) ;
// $d i sp l ay (”method measure ”) ;
endmethod
method Ytype fetchMeasurement ( ) ;
r e turn y . f i r s t ( ) ;
endmethod
method Action removeMeasurement ( ) ;







' F i l e : Fpiped . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
' Computes the ESR’ s phy s i c a l model in a p ip e l i n ed
f a sh i on
'
' The i n v e r s e s o f d e l t a s are computed through an
inve r s i on farm
' Noise i s added to the inputs
' The dynamics are computed
' The dynamics are i n t e g r a t ed in the s t a t e ve c t o r s
(= p a r t i c l e s )
'/
package Fpiped ;
import FixedPoint : : ' ;
import ExponFix : : ' ;
import FIFO : : ' ;
import BRAMFIFO : : ' ;
import Params : : ' ;
import Vector : : ' ;
import Types : : ' ;
import invFarm : : ' ;
/''
abs t ra c t i n t e r f a c e F IFC
Desc r ip t i on :
98
nbpa r t i c l e s i s the number o f p a r t i c l e s that need
to be generated
( a l s o the number o f s t a t e s to f e t ch /update )
i n i t : s e t the i n i t i a l s t a t e o f the
system ( only done once )
evo lve : ask to compute a l l the new
s t a t e s f o r a new command vector ,
app l i ed dur ing t s seconds .
fetchNewState : Get the value o f a new s t a t e
removeNewStateL : remove the new s t a t e value from
the p i p e l i n e to make room
f o r another value .
'/
i n t e r f a c e F IFC#(numeric type nbpa r t i c l e s ) ;
method Action i n i t (Xtype xvect ) ;
method Action evo lve (Utype u , Datatype t s ) ;
method Xtype fetchNewState ( ) ;
method Action removeNewState ( ) ;
end in t e r f a c e
/''
i n t e r f a c e Dyn IFC
Desc r ip t i on :
has two sepe ra t e input methods so that va lue s
can be preloaded
in to the p i p e l i n e be f o r e the i n v e r s e s o f Delta
are fed , s i n c e t h e i r
computation takes many cy c l e s . This saves a few
cy c l e s .
f eedRest should be used un t i l the p i p e l i n e i s
f u l l . I t b lock automat i ca l l y .
f eedInvDel ta should be used to f e ed the i n v e r s e s
o f Delta in to the p i p e l i n e .
f eedInvDel ta : f e ed a new inver t ed de l t a
feedRest : f e ed a l l the other inputs
99
f e tchRates : r e turn the computed r a t e s
acknowledge : remove the computed r a t e s from
the p i p e l i n e to make room
f o r the next r a t e s .
'/
i n t e r f a c e Dyn IFC ;
method Action feedInvDel ta ( X1type invDel ta ) ;
method Action feedRest ( X2type x Ts , X3type x D
,Utype u , ProcessNo i setype nvect ) ;
method Xtype fetchRates ( ) ;
method Action acknowledge ( ) ;
end in t e r f a c e
/''
f unc t i on integrateOverTime
Desc r ip t i on :
Updates the s t a t e by adding the chamges f o r t h i s
s tep
'/
funct i on Xtype integrateOverTime (Xtype xvect , Datatype
ts , Xtype r a t e s ) ;
Xtype newX;
newX. de l t a=xvect . d e l t a+ra t e s . d e l t a ' t s ;
newX . t s=xvect . t s+ra t e s . t s ' t s ;
newX . d=xvect . d+ra t e s . d' t s ;
newX . xram=xvect . xram+ra t e s . xram' t s ;
newX .me=xvect .me+ra t e s .me' t s ;
r e turn newX;
endfunct ion : integrateOverTime
/''
module mkDyn
Desc r ip t i on :
100
Computes the r a t e s o f change o f the s t a t e
v a r i a b l e s in a p ip e l i n ed f a sh i on
'/







// b ia s
In t e g e r vramb=0;
In t e g e r vo l tb =0;
In t e g e r ib =0;
Reg#(Bool ) a va i l a b l e <# mkReg(True ) ;
Reg#(Utype ) uReg <# mkReg (? ) ;
//FIFOS
FIFO#(ProcessNo i setype ) f i f o N o i s e
<# mkSizedBRAMFIFO(9) ;
FIFO#(X1type ) f i f o I n vD e l t a
<# mkSizedBRAMFIFO(9) ;
FIFO#(X2type ) fifo Ts4computeQMandQS
<# mkSizedBRAMFIFO(9) ;
FIFO#(X2type ) f i fo Ts4computeExponent
<# mkSizedBRAMFIFO(9) ;
FIFO#(X3type ) f i f o D
<# mkSizedBRAMFIFO(9) ;
FIFO#(Datatype ) f i fo Qm
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o Q s
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o Rd
<# mkSizedBRAMFIFO(7) ;
101
FIFO#(Datatype ) f i f o Exponent
<# mkLFIFO( ) ;
FIFO#(Datatype ) fifo QmQs
<# mkSizedBRAMFIFO(3) ;
FIFO#(Datatype ) f i fo Pm
<# mkSizedBRAMFIFO(9) ;
FIFO#(Datatype ) f i f o R
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o I 4 V o l t
<# mkSizedBRAMFIFO(5) ;
FIFO#(Datatype ) f i f o I 4P
<# mkSizedBRAMFIFO(5) ;
FIFO#(Datatype ) f i f o Vram
<# mkSizedBRAMFIFO(5) ;
FIFO#(Datatype ) f i f o V o l t
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o P
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o S d o t
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o d d o t
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o Medot
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o Xramdot
<# mkSizedBRAMFIFO(5) ; // mkSizedFIFO ( valueOf (
nbpa r t i c l e s )+3) ;
FIFO#(Datatype ) f i f o T sdo t
<# mkLFIFO( ) ;
FIFO#(Datatype ) f i f o D e l t a d o t
<# mkSizedBRAMFIFO(3) ;
// I n s t a n c i a t e modules
Exp IFC#( I s i z e , F s i ze ) exponent ia tor <#
mkExponFix ( ) ;
r u l e addnoise ;
102
ProcessNo i setype no i s e=f i f o N o i s e . f i r s t ;
Datatype i=uReg . i c+no i s e . i c+f romInteger (
ib ) ;
Datatype vram=uReg . vramc+no i s e . vramc+
fromInteger (vramb ) ;
f i f o Vram . enq (vram) ;
f i f o Xramdot . enq (vram) ;
f i f o I 4 V o l t . enq ( i ) ;
f i f o I 4P . enq ( i ) ;
f i f o N o i s e . deq ;
$d i sp l ay (” addnoise \n”) ;
endru le
r u l e computeQMandQS ;
X2type x t s=fifo Ts4computeQMandQS . f i r s t
;
f i fo Qm . enq ( ( x ts#p Tm) 'p Ae'p He ) ;
f i f o Q s . enq ( p Hs'2'p Pi' p r i 'p hs0 '( x t s
#p Tss ) ) ;
fifo Ts4computeQMandQS . deq ( ) ;
$d i sp l ay (”computeQMandQS\n”) ;
endru le
r u l e computeRd ;
X3type x D=f i f o D . f i r s t ;
i f ( x D<p d I n f l e c t i o n )
f i f o Rd . enq (p R1#p m0'x D) ;
e l s e
f i f o Rd . enq (p R1#p m1'x D) ;
f i f o D . deq ( ) ;
$d i sp l ay (” computeRd\n”) ;
endru le
103
r u l e computeExponent ;
X2type x t s=fi fo Ts4computeExponent .
f i r s t ;
f i f o Exponent . enq ( ( x ts#p Tsstar )'#
p Aelect ) ;
f i fo Ts4computeExponent . deq ( ) ;
$d i sp l ay (” computeExponent\n”) ;
endru le
r u l e l oadexponent i a l ;
exponent iator . f e ed ( f i f o Exponent . f i r s t ( )
) ;
f i f o Exponent . deq ( ) ;
$d i sp l ay (” l oadexponent i a l \n”) ;
endru le
r u l e computePmAndQmplusQs ;
f i fo Pm . enq ( f i fo Qm . f i r s t ( ) ' c1 ) ; //mur
=0;
fifo QmQs . enq ( f i fo Qm . f i r s t ( )+f i f o Q s .
f i r s t ( ) ) ;
f i fo Qm . deq ( ) ;
f i f o Q s . deq ( ) ;
$d i sp l ay (”computePmAndQmplusQs\n”) ;
endru le
r u l e computeR ;
f i f o R . enq ( f i f o Rd . f i r s t ( ) ' exponent iator
. f e t ch ( ) ) ;
f i f o Rd . deq ( ) ;
exponent iator . removeresu l t ( ) ;
$d i sp l ay (”computeR\n”) ;
endru le
104
r u l e computeVolt ;
f i f o V o l t . enq ( f i f o R . f i r s t ( ) ' f i f o I 4 V o l t
. f i r s t ( )+f romInteger ( vo l tb ) ) ;
f i f o R . deq ( ) ;
f i f o I 4 V o l t . deq ( ) ;
$d i sp l ay (” computeVolt\n”) ;
endru le
r u l e computeP ;
f i f o P . enq ( f i f o V o l t . f i r s t ( ) ' f i f o I 4P .
f i r s t ( ) ) ;
f i f o V o l t . deq ( ) ;
f i f o I 4P . deq ( ) ;
$d i sp l ay (” computeP\n”) ;
endru le
r u l e computeTsdot ;
f i f o T sdo t . enq ( ( f i f o P . f i r s t ( )#fifo QmQs
. f i r s t ( ) ) 'c4 ) ;
f i f o P . deq ( ) ;
fifo QmQs . deq ( ) ;
$d i sp l ay (” computeTsdot\n”) ;
endru le
r u l e computeSdotandDeltadot ;
X1type invDel ta=f i f o I n vD e l t a . f i r s t ( ) ;
f i f o S d o t . enq ( f i fo Pm . f i r s t ( ) ' c2+
fxptTruncate ( invDel ta ) 'p a lphar '
p Csd0 ) ;
f i f o D e l t a d o t . enq ( f i fo Pm . f i r s t ( ) ' c3+
fxptTruncate ( invDel ta ) 'p a lphar 'p Cdd
) ;
105
f i fo Pm . deq ( ) ;
f i f o I n vD e l t a . deq ( ) ;
$d i sp l ay (” computeSdotandDeltadot\n”) ;
endru le
r u l e computeMedotandddot ;
f i f o Medot . enq ( f i f o S d o t . f i r s t ( )'#p rhom
'p Ae ) ;
f i f o d d o t . enq ( f i f o Vram . f i r s t ( ) 'c5#
f i f o S d o t . f i r s t ( ) ) ;
f i f o S d o t . deq ( ) ;
f i f o Vram . deq ( ) ;
$d i sp l ay (” computeMedotandddot \n”) ;
endru le
method Action feedInvDel ta ( X1type invDel ta ) ;
f i f o I n vD e l t a . enq ( invDel ta ) ;
endmethod
method Action feedRest ( X2type x Ts , X3type x D
,Utype u , ProcessNo i setype nvect ) ;
fifo Ts4computeQMandQS . enq ( x Ts ) ;
f i fo Ts4computeExponent . enq ( x Ts ) ;
f i f o D . enq (x D) ;
uReg <=u ;
f i f o N o i s e . enq ( nvect ) ;
endmethod
method Xtype fetchRates ( ) ;
Xtype r a t e s= Xtype {
de l t a : f i f o D e l t a d o t . f i r s t ,
t s : f i f o T sdo t . f i r s t ,
d : f i f o d d o t . f i r s t ,
xram : f i f o Xramdot . f i r s t ,
me : f i f o Medot . f i r s t
} ;
106
re turn r a t e s ;
endmethod
method Action acknowledge ( ) ;
f i f o D e l t a d o t . deq ( ) ;
f i f o T sdo t . deq ( ) ;
f i f o d d o t . deq ( ) ;
f i f o Xramdot . deq ( ) ;





Desc r ip t i on :
l i n k s to gether the pa r t s o f the phys i c a l model :
The i n v e r t e r farm
The dynamics module
The i n t e g r a t i o n module
'/
module mkFpiped (F IFC#(nbpa r t i c l e s ) ) ;
Reg#(Bool ) a va i l a b l e <#mkReg(True ) ;
Xtype xde fau l t = Xtype
{
de l t a : p de l ta0 ,
t s : p Ts0 ,
d : p d0 ,
xram : 0 ,
me : 100000
} ;
ProcessNo i setype pn = ProcessNo i setype
{




Reg#(Vector#(nbpa r t i c l e s , Xtype ) )
xvectReg <# mkReg( r e p l i c a t e ( xde f au l t ) )
;
Reg#(Vector#(nbpa r t i c l e s , ProcessNo i setype ) )
pnvectReg <# mkReg( r e p l i c a t e (pn) ) ;
Reg#(Utype ) uReg
<# mkReg (? ) ;
Reg#(Datatype ) tsReg
<# mkReg (0 ) ;
Reg#(UInt#(TAdd#(TLog#(nbpa r t i c l e s ) ,1 ) ) )
invCount <#mkReg (0 ) ;
Reg#(UInt#(TAdd#(TLog#(nbpa r t i c l e s ) ,1 ) ) )
dynCount <#mkReg (0 ) ;
Reg#(UInt#(TAdd#(TLog#(nbpa r t i c l e s ) ,1 ) ) )
intCount <#mkReg (0 ) ;
//Reg#(UInt#(TLog#(nbpa r t i c l e s ) ) )
noiseCount <#mkReg (0 ) ;
//SUBMODULES
InvFarm IFC#( I s i z e , Fs ize ,TAdd#(TMul#(Fsize , 2 ) ,4 )
) invFarm <# mkInvFarm ;
Dyn IFC dynamics <# mkDyn;
FIFO#(Xtype ) f i f o newX
<# mkLFIFO( ) ;
r u l e feedInvFarm ( invCount<f romInteger ( valueOf (
nbpa r t i c l e s ) )&&! a v a i l a b l e ) ;




// $d i sp l ay (”mkFpiped : feedInvFarm ”) ;
endru le
r u l e feedDyn ( ! a v a i l a b l e && (dynCount<
f romInteger ( valueOf ( nbpa r t i c l e s ) ) ) ) ;
Xtype xtemp=xvectReg [ dynCount ] ;
dynamics . f eedRest ( xtemp . ts , xtemp . d , uReg ,
pnvectReg [ dynCount ] ) ;
dynCount<=dynCount+1;
// $d i sp l ay (”mkFpiped : feedDyn ”) ;
endru le
r u l e feedInvDeltaIntoDyn ( ! a v a i l a b l e ) ;
Maybe#(X1type ) rawres=invFarm . r e s u l t ( ) ;
X1type r e s=fromMaybe ( unpack ( ’ 1 ) , rawres ) ;
invFarm . acknowledge ( ) ;
i f ( i sVa l i d ( rawres ) )
a c t i on
dynamics . f eedInvDel ta (# r e s ) ;
// $d i sp l ay (” Result = %b” ,
r e s ) ;
// $wr i te ( ”mkFpiped : Result i s ”
) ; fxptWrite ( 7 , r e s ) ;
$d i sp l ay (”” ) ;
endact ion
e l s e
// $d i sp l ay (”mkFpiped : INVALID
DIVISION”) ;
endru le
r u l e i n t e g r a t e ( ! a v a i l a b l e && ( intCount<
f romInteger ( valueOf ( nbpa r t i c l e s ) ) ) ) ;
Xtype r a t e s=dynamics . f e tchRates ( ) ;
f i f o newX . enq ( integrateOverTime ( xvectReg
[ intCount ] , tsReg , r a t e s ) ) ;
109
intCount<=intCount +1;
dynamics . acknowledge ( ) ;
// $d i sp l ay (”mkFpiped : i n t e g r a t e ”) ;
// $wr i te ( ”mkFpiped integrate : xvect .
d e l t a i s ” ) ; fxptWrite ( 7 , r a t e s .
d e l t a ) ; $d i sp l ay (”” ) ;
// $wr i te ( ”mkFpiped integrate : xvect . t s
i s ” ) ; fxptWrite ( 7 , r a t e s . t s ) ;
$d i sp l ay (”” ) ;
// $wr i te ( ”mkFpiped integrate : xvect . d
i s ” ) ; fxptWrite ( 7 , r a t e s . d ) ;
$d i sp l ay (”” ) ;
// $wr i te ( ”mkFpiped integrate : xvect .
xram i s ” ) ; fxptWrite ( 7 , r a t e s .
xram ) ; $d i sp l ay (”” ) ;
// $wr i te ( ”mkFpiped integrate : xvect .me
i s ” ) ; fxptWrite ( 7 , r a t e s .me ) ;
$d i sp l ay (”” ) ;
endru le
r u l e makeAvailable ( ! a v a i l a b l e && ( intCount>=
fromInteger ( valueOf ( nbpa r t i c l e s ) ) ) ) ;
a va i l ab l e<=True ;
endru le
method Action i n i t (Xtype xvect ) ;
xvectReg <= r e p l i c a t e ( xvect ) ;
endmethod
method Action evo lve (Utype u , Datatype t s ) i f (
a v a i l a b l e ) ;








method Xtype fetchNewState ( ) ;
r e turn f i fo newX . f i r s t ( ) ;
endmethod
method Action removeNewState ( ) ;






' F i l e : FHpiped . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
' Computes the ESR’ s phy s i c a l (F) and measurement
(H) models in a
p ip e l i n ed f a sh i on . F and H are connected through
a ru l e that
s t a t e s that when a r e s u l t i s a v a i l a b l e form F,
t h i s r e s u l t
i s taken and fed in to H.
Upon i n s t a n c i a t i o n o f the FHpiped module , a
number o f c op i e s
111
i s provided . This w i l l c r e a t e p a r a l l e l c op i e s o f
FH computation




import FixedPoint : : ' ;
import Vector : : ' ;
import Types : : ' ;
import HBRAM : : ' ;
import Fpiped : : ' ;
import Params : : ' ;
/''
i n t e r f a c e FHpiped IFC
Parameters :
nc : number o f FH cop i e s ( w i l l i n s t a n c i a t e the
hardware nc t imes )
np : number o f p a r t i c l e s per cop i e s
Methods :
i n i t : s e t the i n i t i a l s t a t e o f the ESR
( should only be used once )
startNewStep : ask to compute a l l the p a r t i c l e s
with a new command
vec to r and f o r t h i s s tep o f
l ength t s .
fetchMeasurements : r e tu rns a vec to r
conta in ing a l l the s imu la t i on r e s u l t s
in the p i p e l i n e f o r each
copy
removeMeasurement : removes the cur r ent
r e s u l t s from the p i p e l i n e o f each copy
( only executed i f a l l




i n t e r f a c e FHpiped IFC#(numeric type nc , numeric type np)
;
method Action i n i t (Xtype x i n i t ) ;
method Action startNewStep (Utype uin ,
Datatype t s ) ;
method Vector#(nc , Ytype )
fetchMeasurements ( ) ;
method Action removeMeasurement ( ) ;
end in t e r f a c e
module mkFHpiped (FHpiped IFC#(nc , np ) ) ;
I n t e g e r nb cop i e s=valueOf ( nc ) ;
I n t e g e r nb pa r t i c l e s=valueOf (np) ;
// i n i t i a l s t a t e
Xtype xde fau l t = Xtype
{
de l t a : p de l ta0 ,
t s : p Ts0 ,
d : p d0 ,
xram : 0 ,
me : 100000
} ;
// proce s s no i s e
ProcessNo i setype pn = ProcessNo i setype
{
i c : 0 ,
vramc : 0
} ;
//measurement no i s e
MeasNoisetype mn = MeasNoisetype
{
d : 0 ,
xram : 0 ,
i r : 0 ,
113
l c : 0 ,
v o l t : 0
} ;
Reg#(Utype ) uReg <# mkReg
(Utype{ i c : 0 , vramc : 0} ) ;
Reg#(MeasNoisetype ) mnvectReg <# mkReg
(mn) ;
F IFC#(np) f [ nb cop i e s ] ;
H IFC h [ nb cop i e s ] ;
f o r ( I n t e g e r i =0; i<nb cop i e s ; i=i +1)
begin
f [ i ] <# mkFpiped ;
h [ i ] <# mkH;
end
f o r ( I n t e g e r i =0; i<nb cop i e s ; i=i +1)
begin
ru l e feedNewStateIntoHandRemoveFromF i ;
f [ i ] . removeNewState ( ) ;
h [ i ] . measure ( f [ i ] . fetchNewState
( ) , uReg ,mnvectReg ) ;
$d i sp l ay (”
feedNewStateIntoHandRemoveFromF
\n”) ;
$wr i te ( ”
feedNewStateIntoHandRemoveFromF i : f [
i ] . fetchNewState ( ) . d e l t a i s ” ) ;
fxptWrite ( 7 , f [ i ] . fetchNewState ( ) .
d e l t a ) ; $d i sp l ay (”” ) ;
$wr i te ( ”
feedNewStateIntoHandRemoveFromF i : f [
114
i ] . fetchNewState ( ) . t s i s ” ) ;
fxptWrite ( 7 , f [ i ] . fetchNewState ( ) . t s
) ; $d i sp l ay (”” ) ;
$wr i te ( ”
feedNewStateIntoHandRemoveFromF i : f [
i ] . fetchNewState ( ) . d i s ” ) ;
fxptWrite ( 7 , f [ i ] . fetchNewState ( ) . d
) ; $d i sp l ay (”” ) ;
$wr i te ( ”
feedNewStateIntoHandRemoveFromF i : f [
i ] . fetchNewState ( ) . xram i s ” ) ;
fxptWrite ( 7 , f [ i ] . fetchNewState ( ) .
xram ) ; $d i sp l ay (”” ) ;
$wr i te ( ”
feedNewStateIntoHandRemoveFromF i : f [
i ] . fetchNewState ( ) .me i s ” ) ;
fxptWrite ( 7 , f [ i ] . fetchNewState ( ) .me




r u l e getMeasurement ; //whenever a measurement i s
ava i b l e
h . removeMeasurement ( ) ;
$wr i te ( ”xram r e s u l t i s ” ) ; fxptWrite (
10 , h . fetchMeasurement ( ) . xram ) ;
$d i sp l ay (”\n”) ;
$d i sp l ay (”FHTest : getMeasurement\n”) ;
endru le
'/
method Action i n i t (Xtype x i n i t ) ;
f o r ( I n t e g e r i =0; i<nb cop i e s ; i=i +1)
begin
f [ i ] . i n i t ( x i n i t ) ;
end
115
/' xvectReg <= r e p l i c a t e ( x i n i t ) ;
xvectReg2 <= r e p l i c a t e ( x i n i t ) ; '/
// i n i t i a l i z e d <=True ;
endmethod
method Action startNewStep (Utype uin , Datatype
t s ) ; // i f ( i n i t i a l i z e d ) ;
f o r ( I n t e g e r i =0; i<nb cop i e s ; i=i +1)
begin




method Vector#(nc , Ytype )
fetchMeasurements ( ) ; //When both measurements
are a v a i l a b l e
Vector#(nc , Ytype ) vect ;
f o r ( I n t e g e r i =0; i<nb cop i e s ; i=i +1)
begin




method Action removeMeasurement ( ) ;
f o r ( I n t e g e r i =0; i<nb cop i e s ; i=i +1)
begin








import FixedPoint : : ' ;
import Vector : : ' ;
import Types : : ' ;
import FHpiped : : ' ;
import Params : : ' ;
typede f 50 NbPart i c l e s ;
typede f 1 NbCopies ;
module mkFHpipedTb (Empty) ;
Utype u ;
Reg#(Vector#(NbPart ic les , Vector#(NbCopies , Ytype )
) ) yvectReg <# mkReg (? ) ;
I n t eg e r np=va lueo f ( NbPart i c l e s ) ;
I n t eg e r nc=va lueo f ( NbCopies ) ;
I n t eg e r nbSteps =3;
Datatype t s=fromRational ( 2 , 15 ) ; // sampling time
//command vec to r
u . i c =200;
u . vramc=20;
// i n i t s t a t e
Xtype x i n i t = Xtype
{
de l t a : p de l ta0 ,
t s : p Ts0 ,
d : p d0 ,




FHpiped IFC#(NbCopies , NbPart i c l e s ) fh <#
mkFHpiped ( ) ;
Reg#(Utype ) uReg <# mkReg(u) ;
Reg#(UInt#(50) ) simCount <# mkReg (0 ) ;
Reg#(UInt#(50) ) fetchCount <# mkReg (0 ) ;
Reg#(UInt#(20) ) count <# mkReg (0 ) ;
r u l e i n i t ( count==0) ;
fh . i n i t ( x i n i t ) ;
endru le
r u l e feedNewU ( simCount<f romInteger ( nbSteps ) ) ;
fh . startNewStep (uReg , t s ) ;
simCount<=simCount+1;
$d i sp l ay (”mkFHpipedTb : feedNewU” ) ;
endru le
r u l e f e tchAndDisp layS imulat ions ; / / ( simCount<
f romInteger ( np) ) ;
Vector#(NbCopies , Ytype ) measurements=fh .
fetchMeasurements ( ) ;
yvectReg [ fetchCount ]<=measurements ;
fh . removeMeasurement ( ) ;
$d i sp l ay (”mkFHpipedTb : FETCH” ) ;
fetchCount<=fetchCount +1;
f o r ( I n t e g e r j =0; j<f romInteger ( nc ) ; j=j +1)
begin
$wr i te ( ”yvectReg [ s tep :
%d ] [ copy : %d ] . d= ” ,
fetchCount , j ) ;
fxptWrite ( 10 ,
118
measurements [ j ] . d ) ;
$d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [ s tep :
%d ] [ copy : %d ] . xram=
” , fetchCount , j ) ;
fxptWrite ( 10 ,
measurements [ j ] . xram
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [ s tep :
%d ] [ copy : %d ] . i r= ” ,
fetchCount , j ) ;
fxptWrite ( 10 ,
measurements [ j ] . i r )
; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [ s tep :
%d ] [ copy : %d ] . l c= ” ,
fetchCount , j ) ;
fxptWrite ( 10 ,
measurements [ j ] . l c )
; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [ s tep :
%d ] [ copy : %d ] . vo l t=
” , fetchCount , j ) ;
fxptWrite ( 10 ,
measurements [ j ] . v o l t
) ; $d i sp l ay (”\n”) ;
end
endru le
/' r u l e d i sp l ayS imu la t i on s ( fetchCount>=fromInteger (
nc ) ) ;
f o r ( I n t e g e r j =0; j<f romInteger ( nc ) ; j=j +1)
begin
f o r ( I n t e g e r i =0; i<f romInteger ( np
) ; i=i +1)
ac t i on
119
$wr i te ( ”yvectReg[%d ] . d=
” , i ) ; fxptWrite (
10 , yvectReg [ i ] [ j ] . d
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] .
xram= ” , i ) ;
fxptWrite ( 10 ,
yvectReg [ i ] [ j ] . xram )
; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] . i r
= ” , i ) ; fxptWrite (
10 , yvectReg [ i ] [ j ] . i r
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] . l c
= ” , i ) ; fxptWrite (
10 , yvectReg [ i ] [ j ] . l c
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] .
vo l t= ” , i ) ;
fxptWrite ( 10 ,
yvectReg [ i ] [ j ] . v o l t )




ru l e cycleCount ;
count <= count+1;
$d i sp l ay (”mkFHpipedTb : #############
Count = %d ###########\n” , count ) ;
endru le
r u l e stop ( count >350) ;







' F i l e : MultipleFH . bsv
' Date c r ea t ed : not sure
' Last update : June 09
' Author : Thomas Lauzon
' Desc r ip t i on :
' This i s the Top module .
Al l i t does i s i n s t a n c i a t i n g the FHpiped module
with a given number o f p a r a l l e l c op i e s
and a nuber o f p a r t i c l e s and prov ide s an
i n t e r f a c e to s t a r t the computations and f e t ch
the r e s u l t s .
'/
package MultipleFH ;
import FixedPoint : : ' ;
import Vector : : ' ;
import Types : : ' ;
import FHpiped : : ' ;
import Params : : ' ;
typede f 150 NbPart i c l e s ;
typede f 2 NbCopies ;
/''
i n t e r f a c e MultFHpiped IFC
methods :
i n i t : s e t an i n i t i a l s t a t e ( should only be used
once , at the beginning
startNewStep : r eques t a s imu la t i on f o r a g iven
command ( uin ) app l i ed f o r t s seconds
121
fetchMeasurments : r e tu rns a l l the obse rva t i ons
in the output FIFOs o f a l l s imu la to r s
removeMeasurement : removes a l l the obse rva t i ons
in the output FIFOs o f a l l s imu la to r s
'/
i n t e r f a c e MultFHpiped IFC ;
method Action i n i t (Xtype x i n i t ) ;
method Action startNewStep (Utype uin ,
Datatype t s ) ;
method Vector#(NbCopies , Ytype )
fetchMeasurements ( ) ;
method Action removeMeasurement ( ) ;
end in t e r f a c e
(' s yn the s i z e ')
module mkMultipleFH( MultFHpiped IFC ) ;
FHpiped IFC#(NbCopies , NbPart i c l e s ) fh <#
mkFHpiped ( ) ;
method Action i n i t (Xtype x i n i t ) ;
fh . i n i t ( x i n i t ) ;
endmethod
method Action startNewStep (Utype uin , Datatype
t s ) ;
fh . startNewStep ( uin , t s ) ;
endmethod
method Vector#(NbCopies , Ytype )
fetchMeasurements ( ) ; //When both measurements
are a v a i l a b l e
return fh . fetchMeasurements ( ) ;
endmethod
122
method Action removeMeasurement ( ) ;






import FixedPoint : : ' ;
import Vector : : ' ;
import Types : : ' ;
import MultipleFH : : ' ;
import Params : : ' ;
// typede f 3 NbPart i c l e s ;
// typede f 3 NbCopies ;
module mkMultipleFHTb (Empty) ;
Utype u ;
//Reg#(Vector#(NbPart ic les , Vector#(NbCopies ,
Ytype ) ) ) yvectReg <# mkReg (? ) ;
//Reg#(Vector#(NbCopies , Ytype ) ) yvectReg <#
mkReg (? ) ;
I n t eg e r np=va lueo f ( NbPart i c l e s ) ;
I n t eg e r nc=va lueo f ( NbCopies ) ;
I n t eg e r nbSteps =3;
Datatype t s=fromRational ( 2 , 15 ) ; // sampling time
//command vec to r
u . i c =200;
123
u . vramc=20;
// i n i t s t a t e
Xtype x i n i t = Xtype
{
de l t a : p de l ta0 ,
t s : p Ts0 ,
d : p d0 ,
xram : 0 ,
me : 100000
} ;
MultFHpiped IFC fh <#mkMultipleFH ( ) ;
Reg#(Utype ) uReg <# mkReg(u) ;
Reg#(UInt#(50) ) simCount <# mkReg (0 ) ;
Reg#(UInt#(50) ) fetchCount <# mkReg (0 ) ;
Reg#(UInt#(20) ) count <# mkReg (0 ) ;
r u l e i n i t ( count==0) ;
fh . i n i t ( x i n i t ) ;
endru le
r u l e feedNewU ( simCount<f romInteger ( nbSteps ) ) ;
fh . startNewStep (uReg , t s ) ;
simCount<=simCount+1;
$d i sp l ay (”mkMultipleFHTb : feedNewU” ) ;
endru le
r u l e removeAndDisplaySimulationResults ; / / (
simCount<f romInteger (np ) ) ;
Vector#(NbCopies , Ytype ) measurements=fh .
fetchMeasurements ( ) ;
//yvectReg [ fetchCount ]<=measurements ;
//yvectReg<=measurements ;
124
fh . removeMeasurement ( ) ;
$d i sp l ay (”mkMultipleFHTb : FETCH” ) ;
fetchCount<=fetchCount +1;
f o r ( I n t e g e r j =0; j<f romInteger ( nc ) ; j=j +1)
begin
$wr i te ( ”yvectReg [
fetchCount : %d ] [ copy :
%d ] . d= ” , fetchCount ,
j ) ; fxptWrite ( 10 ,
measurements [ j ] . d ) ;
$d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [
fetchCount : %d ] [ copy :
%d ] . xram= ” ,
fetchCount , j ) ;
fxptWrite ( 10 ,
measurements [ j ] . xram
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [
fetchCount : %d ] [ copy :
%d ] . i r= ” , fetchCount
, j ) ; fxptWrite ( 10 ,
measurements [ j ] . i r )
; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [
fetchCount : %d ] [ copy :
%d ] . l c= ” , fetchCount
, j ) ; fxptWrite ( 10 ,
measurements [ j ] . l c )
; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg [
fetchCount : %d ] [ copy :
%d ] . vo l t= ” ,
fetchCount , j ) ;
fxptWrite ( 10 ,
measurements [ j ] . v o l t
125
) ; $d i sp l ay (”\n”) ;
end
endru le
/' r u l e d i sp l ayS imu la t i on s ( fetchCount>=fromInteger (
nc ) ) ;
f o r ( I n t e g e r j =0; j<f romInteger ( nc ) ; j=j +1)
begin
f o r ( I n t e g e r i =0; i<f romInteger ( np
) ; i=i +1)
ac t i on
$wr i te ( ”yvectReg[%d ] . d=
” , i ) ; fxptWrite (
10 , yvectReg [ i ] [ j ] . d
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] .
xram= ” , i ) ;
fxptWrite ( 10 ,
yvectReg [ i ] [ j ] . xram )
; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] . i r
= ” , i ) ; fxptWrite (
10 , yvectReg [ i ] [ j ] . i r
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] . l c
= ” , i ) ; fxptWrite (
10 , yvectReg [ i ] [ j ] . l c
) ; $d i sp l ay (”\n”) ;
$wr i te ( ”yvectReg[%d ] .
vo l t= ” , i ) ;
fxptWrite ( 10 ,
yvectReg [ i ] [ j ] . v o l t )





r u l e cycleCount ;
count <= count+1;
$d i sp l ay (”mkMultipleFHTb : #############
Count = %d ###########\n” , count ) ;
endru le
r u l e stop ( count >600) ;






[1] S. Ahn. Modeling, estimation, and control of electroslag remelting pro-
cess. PhD thesis, The University of Texas at Austin, 2005.
[2] M.S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial
on particle filters for online nonlinear/non-gaussian bayesian tracking.
Signal Processing, IEEE Transactions on, 50(2):174–188, Feb 2002.
[3] I. Bluespec. Bluespec SystemVerilog Reference Guide, 2008.
[4] J. Detrey, F. de Dinechin, and X. Pujol. Return of the hardware floating-
point elementary function. In 18th Symposium on Computer Arithmetic,
pages 161–168.
[5] A. Doucet and N. De Freitas. Sequential Monte Carlo Methods in Prac-
tice. Springer, 2001.
[6] A. Gothandaraman, G.D. Peterson, G. Lee Warren, R.J. Hinde, and R.J.
Harrison. FPGA acceleration of a quantum Monte Carlo application.
Parallel Computing, 2008.
[7] E.A. Lee. Cyber-physical systems-are computing foundations adequate.
2006.
128
[8] E.A. Lee. Computing foundations and practice for cyber-physical sys-
tems: A preliminary report. University of California at Berkeley, Tech-
nical Report No. UCB/EECS-2007-72, May, 2007.
[9] E.E. Swartzlander. Computer Arithmetic. CRC Press, 1993.




Thomas Lauzon was born in Paris, France on 23 August 1982. He is the
son of Véronique Schumpp and Charles Lauzon. After completing high school
at the École Active Bilingue Jeannine Manuel (EABJM), a bilingual school in
Paris, he entered the École d’Ingénieurs en Électronique et Électrotechnique
(ESIEE-Paris) in Noisy-le-Grand, France. During this time he focused on
embedded systems and interned at an automobile manufacturing plant, an IT
firm, a Brazilian university and an aeronautics company. Upon completion
in 2005, he obtained a Diplôme d’Ingénieur, which is a European Master’s
degree. In 2006, Mr. Lauzon worked for a technical consulting company that
placed him on avionics projects. In August 2006 he started graduate studies
at the University of Texas at Austin.
Permanent address: 27 rue Letellier
75015 Paris, France
This thesis was typeset with LATEX
† by the author.
†LATEX is a document preparation system developed by Leslie Lamport as a special
version of Donald Knuth’s TEX Program.
130
